Importing some important libraries required for Telco-Customer-Churn Data Analysis/Prediction¶

Here, the dataset is downloaded from https://www.kaggle.com/datasets/blastchar/telco-customer-churn?resource=download

This dataset consists of 7043 rows(records) and 21 columns(features). There are 20 independent features and 1 dependent feature("Churn"). The problem statement is, why the customers are moving out(Churn is nothing but leaving out of the business) of the business. We have to give solution to the stakeholder, to solve this problem and to provide a model which predicts customer churn.
In [1]:
import pandas as pd
import numpy as np
import tensorflow as tf
from sklearn import preprocessing
from sklearn.model_selection import train_test_split
import plotly.express as plx
In [2]:
# reading the csv file from the current directory
df = pd.read_csv('WA_Fn-UseC_-Telco-Customer-Churn.csv')
In [3]:
# printing the df variable which holds churn dataset from IBM
df
Out[3]:
customerID gender SeniorCitizen Partner Dependents tenure PhoneService MultipleLines InternetService OnlineSecurity ... DeviceProtection TechSupport StreamingTV StreamingMovies Contract PaperlessBilling PaymentMethod MonthlyCharges TotalCharges Churn
0 7590-VHVEG Female 0 Yes No 1 No No phone service DSL No ... No No No No Month-to-month Yes Electronic check 29.85 29.85 No
1 5575-GNVDE Male 0 No No 34 Yes No DSL Yes ... Yes No No No One year No Mailed check 56.95 1889.5 No
2 3668-QPYBK Male 0 No No 2 Yes No DSL Yes ... No No No No Month-to-month Yes Mailed check 53.85 108.15 Yes
3 7795-CFOCW Male 0 No No 45 No No phone service DSL Yes ... Yes Yes No No One year No Bank transfer (automatic) 42.30 1840.75 No
4 9237-HQITU Female 0 No No 2 Yes No Fiber optic No ... No No No No Month-to-month Yes Electronic check 70.70 151.65 Yes
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
7038 6840-RESVB Male 0 Yes Yes 24 Yes Yes DSL Yes ... Yes Yes Yes Yes One year Yes Mailed check 84.80 1990.5 No
7039 2234-XADUH Female 0 Yes Yes 72 Yes Yes Fiber optic No ... Yes No Yes Yes One year Yes Credit card (automatic) 103.20 7362.9 No
7040 4801-JZAZL Female 0 Yes Yes 11 No No phone service DSL Yes ... No No No No Month-to-month Yes Electronic check 29.60 346.45 No
7041 8361-LTMKD Male 1 Yes No 4 Yes Yes Fiber optic No ... No No No No Month-to-month Yes Mailed check 74.40 306.6 Yes
7042 3186-AJIEK Male 0 No No 66 Yes No Fiber optic Yes ... Yes Yes Yes Yes Two year Yes Bank transfer (automatic) 105.65 6844.5 No

7043 rows × 21 columns

In [4]:
# describe gives a small statistical summary about the dataset(df)
df.describe()
Out[4]:
SeniorCitizen tenure MonthlyCharges
count 7043.000000 7043.000000 7043.000000
mean 0.162147 32.371149 64.761692
std 0.368612 24.559481 30.090047
min 0.000000 0.000000 18.250000
25% 0.000000 9.000000 35.500000
50% 0.000000 29.000000 70.350000
75% 0.000000 55.000000 89.850000
max 1.000000 72.000000 118.750000
In [5]:
# dtypes gives the datatypes of the individual features in the dataframe
df.dtypes
Out[5]:
customerID           object
gender               object
SeniorCitizen         int64
Partner              object
Dependents           object
tenure                int64
PhoneService         object
MultipleLines        object
InternetService      object
OnlineSecurity       object
OnlineBackup         object
DeviceProtection     object
TechSupport          object
StreamingTV          object
StreamingMovies      object
Contract             object
PaperlessBilling     object
PaymentMethod        object
MonthlyCharges      float64
TotalCharges         object
Churn                object
dtype: object
In [6]:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7043 entries, 0 to 7042
Data columns (total 21 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   customerID        7043 non-null   object 
 1   gender            7043 non-null   object 
 2   SeniorCitizen     7043 non-null   int64  
 3   Partner           7043 non-null   object 
 4   Dependents        7043 non-null   object 
 5   tenure            7043 non-null   int64  
 6   PhoneService      7043 non-null   object 
 7   MultipleLines     7043 non-null   object 
 8   InternetService   7043 non-null   object 
 9   OnlineSecurity    7043 non-null   object 
 10  OnlineBackup      7043 non-null   object 
 11  DeviceProtection  7043 non-null   object 
 12  TechSupport       7043 non-null   object 
 13  StreamingTV       7043 non-null   object 
 14  StreamingMovies   7043 non-null   object 
 15  Contract          7043 non-null   object 
 16  PaperlessBilling  7043 non-null   object 
 17  PaymentMethod     7043 non-null   object 
 18  MonthlyCharges    7043 non-null   float64
 19  TotalCharges      7043 non-null   object 
 20  Churn             7043 non-null   object 
dtypes: float64(1), int64(2), object(18)
memory usage: 1.1+ MB

Only few of the features having the datatype of Integer/Float. We have remove some unwanted features and to perform some Feature Engineering task.¶

Feature Engineering¶

1. The First feature is, "customerID". Definitely 'customerID' is not the reason for churn. So, we are removing the feature 'customerID'¶

In [7]:
df = df.drop(['customerID'], axis = 1)
In [8]:
df
Out[8]:
gender SeniorCitizen Partner Dependents tenure PhoneService MultipleLines InternetService OnlineSecurity OnlineBackup DeviceProtection TechSupport StreamingTV StreamingMovies Contract PaperlessBilling PaymentMethod MonthlyCharges TotalCharges Churn
0 Female 0 Yes No 1 No No phone service DSL No Yes No No No No Month-to-month Yes Electronic check 29.85 29.85 No
1 Male 0 No No 34 Yes No DSL Yes No Yes No No No One year No Mailed check 56.95 1889.5 No
2 Male 0 No No 2 Yes No DSL Yes Yes No No No No Month-to-month Yes Mailed check 53.85 108.15 Yes
3 Male 0 No No 45 No No phone service DSL Yes No Yes Yes No No One year No Bank transfer (automatic) 42.30 1840.75 No
4 Female 0 No No 2 Yes No Fiber optic No No No No No No Month-to-month Yes Electronic check 70.70 151.65 Yes
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
7038 Male 0 Yes Yes 24 Yes Yes DSL Yes No Yes Yes Yes Yes One year Yes Mailed check 84.80 1990.5 No
7039 Female 0 Yes Yes 72 Yes Yes Fiber optic No Yes Yes No Yes Yes One year Yes Credit card (automatic) 103.20 7362.9 No
7040 Female 0 Yes Yes 11 No No phone service DSL Yes No No No No No Month-to-month Yes Electronic check 29.60 346.45 No
7041 Male 1 Yes No 4 Yes Yes Fiber optic No No No No No No Month-to-month Yes Mailed check 74.40 306.6 Yes
7042 Male 0 No No 66 Yes No Fiber optic Yes No Yes Yes Yes Yes Two year Yes Bank transfer (automatic) 105.65 6844.5 No

7043 rows × 20 columns

2. The second feature is 'gender'. Let's look at the feature indepth.¶

In [9]:
df['gender'].isnull().sum()
Out[9]:
0

There is no null value in the 'gender' feature.

In [10]:
df['gender'].unique()
Out[10]:
array(['Female', 'Male'], dtype=object)

There is a two unique values in the 'gender' feature(1. Female, 2. Male). Let's do one hot encoding for this feature using get_dummies in pandas.

In [11]:
gender = pd.get_dummies(df['gender'], prefix = 'gender', dtype = 'int', drop_first = True)
In [12]:
gender
Out[12]:
gender_Male
0 0
1 1
2 1
3 1
4 0
... ...
7038 1
7039 0
7040 0
7041 1
7042 1

7043 rows × 1 columns

Removing the gender column from original dataframe.

In [13]:
df = df.drop(['gender'], axis = 1)
In [14]:
df
Out[14]:
SeniorCitizen Partner Dependents tenure PhoneService MultipleLines InternetService OnlineSecurity OnlineBackup DeviceProtection TechSupport StreamingTV StreamingMovies Contract PaperlessBilling PaymentMethod MonthlyCharges TotalCharges Churn
0 0 Yes No 1 No No phone service DSL No Yes No No No No Month-to-month Yes Electronic check 29.85 29.85 No
1 0 No No 34 Yes No DSL Yes No Yes No No No One year No Mailed check 56.95 1889.5 No
2 0 No No 2 Yes No DSL Yes Yes No No No No Month-to-month Yes Mailed check 53.85 108.15 Yes
3 0 No No 45 No No phone service DSL Yes No Yes Yes No No One year No Bank transfer (automatic) 42.30 1840.75 No
4 0 No No 2 Yes No Fiber optic No No No No No No Month-to-month Yes Electronic check 70.70 151.65 Yes
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
7038 0 Yes Yes 24 Yes Yes DSL Yes No Yes Yes Yes Yes One year Yes Mailed check 84.80 1990.5 No
7039 0 Yes Yes 72 Yes Yes Fiber optic No Yes Yes No Yes Yes One year Yes Credit card (automatic) 103.20 7362.9 No
7040 0 Yes Yes 11 No No phone service DSL Yes No No No No No Month-to-month Yes Electronic check 29.60 346.45 No
7041 1 Yes No 4 Yes Yes Fiber optic No No No No No No Month-to-month Yes Mailed check 74.40 306.6 Yes
7042 0 No No 66 Yes No Fiber optic Yes No Yes Yes Yes Yes Two year Yes Bank transfer (automatic) 105.65 6844.5 No

7043 rows × 19 columns

Now, we use concat from pandas to join 'df' and 'gender'.

In [15]:
df = pd.concat((df, gender), axis = 1)
In [16]:
df
Out[16]:
SeniorCitizen Partner Dependents tenure PhoneService MultipleLines InternetService OnlineSecurity OnlineBackup DeviceProtection TechSupport StreamingTV StreamingMovies Contract PaperlessBilling PaymentMethod MonthlyCharges TotalCharges Churn gender_Male
0 0 Yes No 1 No No phone service DSL No Yes No No No No Month-to-month Yes Electronic check 29.85 29.85 No 0
1 0 No No 34 Yes No DSL Yes No Yes No No No One year No Mailed check 56.95 1889.5 No 1
2 0 No No 2 Yes No DSL Yes Yes No No No No Month-to-month Yes Mailed check 53.85 108.15 Yes 1
3 0 No No 45 No No phone service DSL Yes No Yes Yes No No One year No Bank transfer (automatic) 42.30 1840.75 No 1
4 0 No No 2 Yes No Fiber optic No No No No No No Month-to-month Yes Electronic check 70.70 151.65 Yes 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
7038 0 Yes Yes 24 Yes Yes DSL Yes No Yes Yes Yes Yes One year Yes Mailed check 84.80 1990.5 No 1
7039 0 Yes Yes 72 Yes Yes Fiber optic No Yes Yes No Yes Yes One year Yes Credit card (automatic) 103.20 7362.9 No 0
7040 0 Yes Yes 11 No No phone service DSL Yes No No No No No Month-to-month Yes Electronic check 29.60 346.45 No 0
7041 1 Yes No 4 Yes Yes Fiber optic No No No No No No Month-to-month Yes Mailed check 74.40 306.6 Yes 1
7042 0 No No 66 Yes No Fiber optic Yes No Yes Yes Yes Yes Two year Yes Bank transfer (automatic) 105.65 6844.5 No 1

7043 rows × 20 columns

3. Next feature is partner. Let's do feature engineering on it.¶

In [17]:
df['Partner'].isnull().sum()
Out[17]:
0
In [18]:
df['Partner'].unique()
Out[18]:
array(['Yes', 'No'], dtype=object)

There are only two unique values('Yes' and 'No'). Let's do a label encoding. It means that for 'Yes', we have replace it with 1 and for 'No' we have to replace it with 0.

In [19]:
le = preprocessing.LabelEncoder()
In [20]:
df['Partner'] = le.fit_transform(df['Partner'])
In [21]:
df
Out[21]:
SeniorCitizen Partner Dependents tenure PhoneService MultipleLines InternetService OnlineSecurity OnlineBackup DeviceProtection TechSupport StreamingTV StreamingMovies Contract PaperlessBilling PaymentMethod MonthlyCharges TotalCharges Churn gender_Male
0 0 1 No 1 No No phone service DSL No Yes No No No No Month-to-month Yes Electronic check 29.85 29.85 No 0
1 0 0 No 34 Yes No DSL Yes No Yes No No No One year No Mailed check 56.95 1889.5 No 1
2 0 0 No 2 Yes No DSL Yes Yes No No No No Month-to-month Yes Mailed check 53.85 108.15 Yes 1
3 0 0 No 45 No No phone service DSL Yes No Yes Yes No No One year No Bank transfer (automatic) 42.30 1840.75 No 1
4 0 0 No 2 Yes No Fiber optic No No No No No No Month-to-month Yes Electronic check 70.70 151.65 Yes 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
7038 0 1 Yes 24 Yes Yes DSL Yes No Yes Yes Yes Yes One year Yes Mailed check 84.80 1990.5 No 1
7039 0 1 Yes 72 Yes Yes Fiber optic No Yes Yes No Yes Yes One year Yes Credit card (automatic) 103.20 7362.9 No 0
7040 0 1 Yes 11 No No phone service DSL Yes No No No No No Month-to-month Yes Electronic check 29.60 346.45 No 0
7041 1 1 No 4 Yes Yes Fiber optic No No No No No No Month-to-month Yes Mailed check 74.40 306.6 Yes 1
7042 0 0 No 66 Yes No Fiber optic Yes No Yes Yes Yes Yes Two year Yes Bank transfer (automatic) 105.65 6844.5 No 1

7043 rows × 20 columns

4. 'Dependet' feature¶

In [22]:
df['Dependents'].isnull().sum()
Out[22]:
0
In [23]:
df['Dependents'].unique()
Out[23]:
array(['No', 'Yes'], dtype=object)

There are only two unique values('Yes' and 'No'). Let's do a label encoding. It means that for 'Yes', we have replace it with 1 and for 'No' we have to replace it with 0.

In [24]:
df['Dependents'] = le.fit_transform(df['Dependents'])
In [25]:
df
Out[25]:
SeniorCitizen Partner Dependents tenure PhoneService MultipleLines InternetService OnlineSecurity OnlineBackup DeviceProtection TechSupport StreamingTV StreamingMovies Contract PaperlessBilling PaymentMethod MonthlyCharges TotalCharges Churn gender_Male
0 0 1 0 1 No No phone service DSL No Yes No No No No Month-to-month Yes Electronic check 29.85 29.85 No 0
1 0 0 0 34 Yes No DSL Yes No Yes No No No One year No Mailed check 56.95 1889.5 No 1
2 0 0 0 2 Yes No DSL Yes Yes No No No No Month-to-month Yes Mailed check 53.85 108.15 Yes 1
3 0 0 0 45 No No phone service DSL Yes No Yes Yes No No One year No Bank transfer (automatic) 42.30 1840.75 No 1
4 0 0 0 2 Yes No Fiber optic No No No No No No Month-to-month Yes Electronic check 70.70 151.65 Yes 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
7038 0 1 1 24 Yes Yes DSL Yes No Yes Yes Yes Yes One year Yes Mailed check 84.80 1990.5 No 1
7039 0 1 1 72 Yes Yes Fiber optic No Yes Yes No Yes Yes One year Yes Credit card (automatic) 103.20 7362.9 No 0
7040 0 1 1 11 No No phone service DSL Yes No No No No No Month-to-month Yes Electronic check 29.60 346.45 No 0
7041 1 1 0 4 Yes Yes Fiber optic No No No No No No Month-to-month Yes Mailed check 74.40 306.6 Yes 1
7042 0 0 0 66 Yes No Fiber optic Yes No Yes Yes Yes Yes Two year Yes Bank transfer (automatic) 105.65 6844.5 No 1

7043 rows × 20 columns

5.PhoneService¶

In [26]:
df['PhoneService'].isnull().sum()
Out[26]:
0
In [27]:
df['PhoneService'].unique()
Out[27]:
array(['No', 'Yes'], dtype=object)

There are only two unique values('Yes' and 'No'). Let's do a label encoding. It means that for 'Yes', we have replace it with 1 and for 'No' we have to replace it with 0.

In [28]:
df['PhoneService'] = le.fit_transform(df['PhoneService'])
In [29]:
df
Out[29]:
SeniorCitizen Partner Dependents tenure PhoneService MultipleLines InternetService OnlineSecurity OnlineBackup DeviceProtection TechSupport StreamingTV StreamingMovies Contract PaperlessBilling PaymentMethod MonthlyCharges TotalCharges Churn gender_Male
0 0 1 0 1 0 No phone service DSL No Yes No No No No Month-to-month Yes Electronic check 29.85 29.85 No 0
1 0 0 0 34 1 No DSL Yes No Yes No No No One year No Mailed check 56.95 1889.5 No 1
2 0 0 0 2 1 No DSL Yes Yes No No No No Month-to-month Yes Mailed check 53.85 108.15 Yes 1
3 0 0 0 45 0 No phone service DSL Yes No Yes Yes No No One year No Bank transfer (automatic) 42.30 1840.75 No 1
4 0 0 0 2 1 No Fiber optic No No No No No No Month-to-month Yes Electronic check 70.70 151.65 Yes 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
7038 0 1 1 24 1 Yes DSL Yes No Yes Yes Yes Yes One year Yes Mailed check 84.80 1990.5 No 1
7039 0 1 1 72 1 Yes Fiber optic No Yes Yes No Yes Yes One year Yes Credit card (automatic) 103.20 7362.9 No 0
7040 0 1 1 11 0 No phone service DSL Yes No No No No No Month-to-month Yes Electronic check 29.60 346.45 No 0
7041 1 1 0 4 1 Yes Fiber optic No No No No No No Month-to-month Yes Mailed check 74.40 306.6 Yes 1
7042 0 0 0 66 1 No Fiber optic Yes No Yes Yes Yes Yes Two year Yes Bank transfer (automatic) 105.65 6844.5 No 1

7043 rows × 20 columns

6. MultipleLines¶

In [30]:
df['MultipleLines'].isnull().sum()
Out[30]:
0
In [31]:
df['MultipleLines'].unique()
Out[31]:
array(['No phone service', 'No', 'Yes'], dtype=object)
In [32]:
df.loc[df['MultipleLines'] == "No phone service", "MultipleLines"] = "No"
In [33]:
df
Out[33]:
SeniorCitizen Partner Dependents tenure PhoneService MultipleLines InternetService OnlineSecurity OnlineBackup DeviceProtection TechSupport StreamingTV StreamingMovies Contract PaperlessBilling PaymentMethod MonthlyCharges TotalCharges Churn gender_Male
0 0 1 0 1 0 No DSL No Yes No No No No Month-to-month Yes Electronic check 29.85 29.85 No 0
1 0 0 0 34 1 No DSL Yes No Yes No No No One year No Mailed check 56.95 1889.5 No 1
2 0 0 0 2 1 No DSL Yes Yes No No No No Month-to-month Yes Mailed check 53.85 108.15 Yes 1
3 0 0 0 45 0 No DSL Yes No Yes Yes No No One year No Bank transfer (automatic) 42.30 1840.75 No 1
4 0 0 0 2 1 No Fiber optic No No No No No No Month-to-month Yes Electronic check 70.70 151.65 Yes 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
7038 0 1 1 24 1 Yes DSL Yes No Yes Yes Yes Yes One year Yes Mailed check 84.80 1990.5 No 1
7039 0 1 1 72 1 Yes Fiber optic No Yes Yes No Yes Yes One year Yes Credit card (automatic) 103.20 7362.9 No 0
7040 0 1 1 11 0 No DSL Yes No No No No No Month-to-month Yes Electronic check 29.60 346.45 No 0
7041 1 1 0 4 1 Yes Fiber optic No No No No No No Month-to-month Yes Mailed check 74.40 306.6 Yes 1
7042 0 0 0 66 1 No Fiber optic Yes No Yes Yes Yes Yes Two year Yes Bank transfer (automatic) 105.65 6844.5 No 1

7043 rows × 20 columns

In [34]:
df['MultipleLines'].unique()
Out[34]:
array(['No', 'Yes'], dtype=object)

We change that 'No phone service' to 'No'. Now, we can perform Label Encoding using the object 'le'.

In [35]:
df['MultipleLines'] = le.fit_transform(df['MultipleLines'])
In [36]:
df
Out[36]:
SeniorCitizen Partner Dependents tenure PhoneService MultipleLines InternetService OnlineSecurity OnlineBackup DeviceProtection TechSupport StreamingTV StreamingMovies Contract PaperlessBilling PaymentMethod MonthlyCharges TotalCharges Churn gender_Male
0 0 1 0 1 0 0 DSL No Yes No No No No Month-to-month Yes Electronic check 29.85 29.85 No 0
1 0 0 0 34 1 0 DSL Yes No Yes No No No One year No Mailed check 56.95 1889.5 No 1
2 0 0 0 2 1 0 DSL Yes Yes No No No No Month-to-month Yes Mailed check 53.85 108.15 Yes 1
3 0 0 0 45 0 0 DSL Yes No Yes Yes No No One year No Bank transfer (automatic) 42.30 1840.75 No 1
4 0 0 0 2 1 0 Fiber optic No No No No No No Month-to-month Yes Electronic check 70.70 151.65 Yes 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
7038 0 1 1 24 1 1 DSL Yes No Yes Yes Yes Yes One year Yes Mailed check 84.80 1990.5 No 1
7039 0 1 1 72 1 1 Fiber optic No Yes Yes No Yes Yes One year Yes Credit card (automatic) 103.20 7362.9 No 0
7040 0 1 1 11 0 0 DSL Yes No No No No No Month-to-month Yes Electronic check 29.60 346.45 No 0
7041 1 1 0 4 1 1 Fiber optic No No No No No No Month-to-month Yes Mailed check 74.40 306.6 Yes 1
7042 0 0 0 66 1 0 Fiber optic Yes No Yes Yes Yes Yes Two year Yes Bank transfer (automatic) 105.65 6844.5 No 1

7043 rows × 20 columns

7. InternetService¶

In [37]:
df['InternetService'].isnull().sum()
Out[37]:
0
In [38]:
df['InternetService'].unique()
Out[38]:
array(['DSL', 'Fiber optic', 'No'], dtype=object)

Let's perform one-hot encoding for the feature 'InternetService'

In [39]:
internet_service = pd.get_dummies(df['InternetService'], prefix = 'InternetService', dtype = 'int')

Dropping the feature 'InternetService' and including encoded features from 'internet_service'

In [40]:
df = df.drop(['InternetService'], axis = 1)
In [41]:
df = pd.concat((df, internet_service), axis = 1)
In [42]:
df
Out[42]:
SeniorCitizen Partner Dependents tenure PhoneService MultipleLines OnlineSecurity OnlineBackup DeviceProtection TechSupport ... Contract PaperlessBilling PaymentMethod MonthlyCharges TotalCharges Churn gender_Male InternetService_DSL InternetService_Fiber optic InternetService_No
0 0 1 0 1 0 0 No Yes No No ... Month-to-month Yes Electronic check 29.85 29.85 No 0 1 0 0
1 0 0 0 34 1 0 Yes No Yes No ... One year No Mailed check 56.95 1889.5 No 1 1 0 0
2 0 0 0 2 1 0 Yes Yes No No ... Month-to-month Yes Mailed check 53.85 108.15 Yes 1 1 0 0
3 0 0 0 45 0 0 Yes No Yes Yes ... One year No Bank transfer (automatic) 42.30 1840.75 No 1 1 0 0
4 0 0 0 2 1 0 No No No No ... Month-to-month Yes Electronic check 70.70 151.65 Yes 0 0 1 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
7038 0 1 1 24 1 1 Yes No Yes Yes ... One year Yes Mailed check 84.80 1990.5 No 1 1 0 0
7039 0 1 1 72 1 1 No Yes Yes No ... One year Yes Credit card (automatic) 103.20 7362.9 No 0 0 1 0
7040 0 1 1 11 0 0 Yes No No No ... Month-to-month Yes Electronic check 29.60 346.45 No 0 1 0 0
7041 1 1 0 4 1 1 No No No No ... Month-to-month Yes Mailed check 74.40 306.6 Yes 1 0 1 0
7042 0 0 0 66 1 0 Yes No Yes Yes ... Two year Yes Bank transfer (automatic) 105.65 6844.5 No 1 0 1 0

7043 rows × 22 columns

8. OnlineSecurity¶

In [43]:
df['OnlineSecurity'].isnull().sum()
Out[43]:
0
In [44]:
df['OnlineSecurity'].unique()
Out[44]:
array(['No', 'Yes', 'No internet service'], dtype=object)
In [45]:
df.loc[df['OnlineSecurity'] == "No internet service", "OnlineSecurity"] = "No"

We change that 'No internet service' to 'No'. Now, we can perform Label Encoding using the object 'le'.

In [46]:
df['OnlineSecurity'].unique()
Out[46]:
array(['No', 'Yes'], dtype=object)
In [47]:
df
Out[47]:
SeniorCitizen Partner Dependents tenure PhoneService MultipleLines OnlineSecurity OnlineBackup DeviceProtection TechSupport ... Contract PaperlessBilling PaymentMethod MonthlyCharges TotalCharges Churn gender_Male InternetService_DSL InternetService_Fiber optic InternetService_No
0 0 1 0 1 0 0 No Yes No No ... Month-to-month Yes Electronic check 29.85 29.85 No 0 1 0 0
1 0 0 0 34 1 0 Yes No Yes No ... One year No Mailed check 56.95 1889.5 No 1 1 0 0
2 0 0 0 2 1 0 Yes Yes No No ... Month-to-month Yes Mailed check 53.85 108.15 Yes 1 1 0 0
3 0 0 0 45 0 0 Yes No Yes Yes ... One year No Bank transfer (automatic) 42.30 1840.75 No 1 1 0 0
4 0 0 0 2 1 0 No No No No ... Month-to-month Yes Electronic check 70.70 151.65 Yes 0 0 1 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
7038 0 1 1 24 1 1 Yes No Yes Yes ... One year Yes Mailed check 84.80 1990.5 No 1 1 0 0
7039 0 1 1 72 1 1 No Yes Yes No ... One year Yes Credit card (automatic) 103.20 7362.9 No 0 0 1 0
7040 0 1 1 11 0 0 Yes No No No ... Month-to-month Yes Electronic check 29.60 346.45 No 0 1 0 0
7041 1 1 0 4 1 1 No No No No ... Month-to-month Yes Mailed check 74.40 306.6 Yes 1 0 1 0
7042 0 0 0 66 1 0 Yes No Yes Yes ... Two year Yes Bank transfer (automatic) 105.65 6844.5 No 1 0 1 0

7043 rows × 22 columns

In [48]:
df['OnlineSecurity'] = le.fit_transform(df['OnlineSecurity'])
In [49]:
df
Out[49]:
SeniorCitizen Partner Dependents tenure PhoneService MultipleLines OnlineSecurity OnlineBackup DeviceProtection TechSupport ... Contract PaperlessBilling PaymentMethod MonthlyCharges TotalCharges Churn gender_Male InternetService_DSL InternetService_Fiber optic InternetService_No
0 0 1 0 1 0 0 0 Yes No No ... Month-to-month Yes Electronic check 29.85 29.85 No 0 1 0 0
1 0 0 0 34 1 0 1 No Yes No ... One year No Mailed check 56.95 1889.5 No 1 1 0 0
2 0 0 0 2 1 0 1 Yes No No ... Month-to-month Yes Mailed check 53.85 108.15 Yes 1 1 0 0
3 0 0 0 45 0 0 1 No Yes Yes ... One year No Bank transfer (automatic) 42.30 1840.75 No 1 1 0 0
4 0 0 0 2 1 0 0 No No No ... Month-to-month Yes Electronic check 70.70 151.65 Yes 0 0 1 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
7038 0 1 1 24 1 1 1 No Yes Yes ... One year Yes Mailed check 84.80 1990.5 No 1 1 0 0
7039 0 1 1 72 1 1 0 Yes Yes No ... One year Yes Credit card (automatic) 103.20 7362.9 No 0 0 1 0
7040 0 1 1 11 0 0 1 No No No ... Month-to-month Yes Electronic check 29.60 346.45 No 0 1 0 0
7041 1 1 0 4 1 1 0 No No No ... Month-to-month Yes Mailed check 74.40 306.6 Yes 1 0 1 0
7042 0 0 0 66 1 0 1 No Yes Yes ... Two year Yes Bank transfer (automatic) 105.65 6844.5 No 1 0 1 0

7043 rows × 22 columns

9. OnlineBackup¶

In [50]:
df['OnlineBackup'].isnull().sum()
Out[50]:
0
In [51]:
df['OnlineBackup'].unique()
Out[51]:
array(['Yes', 'No', 'No internet service'], dtype=object)
In [52]:
df.loc[df['OnlineBackup'] == "No internet service", "OnlineBackup"] = "No"
In [53]:
df['OnlineBackup'].unique()
Out[53]:
array(['Yes', 'No'], dtype=object)
In [54]:
df['OnlineBackup'] = le.fit_transform(df['OnlineBackup'])
In [55]:
df
Out[55]:
SeniorCitizen Partner Dependents tenure PhoneService MultipleLines OnlineSecurity OnlineBackup DeviceProtection TechSupport ... Contract PaperlessBilling PaymentMethod MonthlyCharges TotalCharges Churn gender_Male InternetService_DSL InternetService_Fiber optic InternetService_No
0 0 1 0 1 0 0 0 1 No No ... Month-to-month Yes Electronic check 29.85 29.85 No 0 1 0 0
1 0 0 0 34 1 0 1 0 Yes No ... One year No Mailed check 56.95 1889.5 No 1 1 0 0
2 0 0 0 2 1 0 1 1 No No ... Month-to-month Yes Mailed check 53.85 108.15 Yes 1 1 0 0
3 0 0 0 45 0 0 1 0 Yes Yes ... One year No Bank transfer (automatic) 42.30 1840.75 No 1 1 0 0
4 0 0 0 2 1 0 0 0 No No ... Month-to-month Yes Electronic check 70.70 151.65 Yes 0 0 1 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
7038 0 1 1 24 1 1 1 0 Yes Yes ... One year Yes Mailed check 84.80 1990.5 No 1 1 0 0
7039 0 1 1 72 1 1 0 1 Yes No ... One year Yes Credit card (automatic) 103.20 7362.9 No 0 0 1 0
7040 0 1 1 11 0 0 1 0 No No ... Month-to-month Yes Electronic check 29.60 346.45 No 0 1 0 0
7041 1 1 0 4 1 1 0 0 No No ... Month-to-month Yes Mailed check 74.40 306.6 Yes 1 0 1 0
7042 0 0 0 66 1 0 1 0 Yes Yes ... Two year Yes Bank transfer (automatic) 105.65 6844.5 No 1 0 1 0

7043 rows × 22 columns

10. DeviceProtection¶

In [56]:
df['DeviceProtection'].isnull().sum()
Out[56]:
0
In [57]:
df['DeviceProtection'].unique()
Out[57]:
array(['No', 'Yes', 'No internet service'], dtype=object)
In [58]:
df.loc[df['DeviceProtection'] == "No internet service", "DeviceProtection"] = "No"
In [59]:
df['DeviceProtection'].unique()
Out[59]:
array(['No', 'Yes'], dtype=object)
In [60]:
df['DeviceProtection'] = le.fit_transform(df['DeviceProtection'])
In [61]:
df
Out[61]:
SeniorCitizen Partner Dependents tenure PhoneService MultipleLines OnlineSecurity OnlineBackup DeviceProtection TechSupport ... Contract PaperlessBilling PaymentMethod MonthlyCharges TotalCharges Churn gender_Male InternetService_DSL InternetService_Fiber optic InternetService_No
0 0 1 0 1 0 0 0 1 0 No ... Month-to-month Yes Electronic check 29.85 29.85 No 0 1 0 0
1 0 0 0 34 1 0 1 0 1 No ... One year No Mailed check 56.95 1889.5 No 1 1 0 0
2 0 0 0 2 1 0 1 1 0 No ... Month-to-month Yes Mailed check 53.85 108.15 Yes 1 1 0 0
3 0 0 0 45 0 0 1 0 1 Yes ... One year No Bank transfer (automatic) 42.30 1840.75 No 1 1 0 0
4 0 0 0 2 1 0 0 0 0 No ... Month-to-month Yes Electronic check 70.70 151.65 Yes 0 0 1 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
7038 0 1 1 24 1 1 1 0 1 Yes ... One year Yes Mailed check 84.80 1990.5 No 1 1 0 0
7039 0 1 1 72 1 1 0 1 1 No ... One year Yes Credit card (automatic) 103.20 7362.9 No 0 0 1 0
7040 0 1 1 11 0 0 1 0 0 No ... Month-to-month Yes Electronic check 29.60 346.45 No 0 1 0 0
7041 1 1 0 4 1 1 0 0 0 No ... Month-to-month Yes Mailed check 74.40 306.6 Yes 1 0 1 0
7042 0 0 0 66 1 0 1 0 1 Yes ... Two year Yes Bank transfer (automatic) 105.65 6844.5 No 1 0 1 0

7043 rows × 22 columns

11. TechSupport¶

In [62]:
df['TechSupport'].isnull().sum()
Out[62]:
0
In [63]:
df['TechSupport'].unique()
Out[63]:
array(['No', 'Yes', 'No internet service'], dtype=object)
In [64]:
df.loc[df['TechSupport'] == "No internet service", "TechSupport"] = "No"
In [65]:
df['TechSupport'].unique()
Out[65]:
array(['No', 'Yes'], dtype=object)
In [66]:
df['TechSupport'] = le.fit_transform(df['TechSupport'])
In [67]:
df
Out[67]:
SeniorCitizen Partner Dependents tenure PhoneService MultipleLines OnlineSecurity OnlineBackup DeviceProtection TechSupport ... Contract PaperlessBilling PaymentMethod MonthlyCharges TotalCharges Churn gender_Male InternetService_DSL InternetService_Fiber optic InternetService_No
0 0 1 0 1 0 0 0 1 0 0 ... Month-to-month Yes Electronic check 29.85 29.85 No 0 1 0 0
1 0 0 0 34 1 0 1 0 1 0 ... One year No Mailed check 56.95 1889.5 No 1 1 0 0
2 0 0 0 2 1 0 1 1 0 0 ... Month-to-month Yes Mailed check 53.85 108.15 Yes 1 1 0 0
3 0 0 0 45 0 0 1 0 1 1 ... One year No Bank transfer (automatic) 42.30 1840.75 No 1 1 0 0
4 0 0 0 2 1 0 0 0 0 0 ... Month-to-month Yes Electronic check 70.70 151.65 Yes 0 0 1 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
7038 0 1 1 24 1 1 1 0 1 1 ... One year Yes Mailed check 84.80 1990.5 No 1 1 0 0
7039 0 1 1 72 1 1 0 1 1 0 ... One year Yes Credit card (automatic) 103.20 7362.9 No 0 0 1 0
7040 0 1 1 11 0 0 1 0 0 0 ... Month-to-month Yes Electronic check 29.60 346.45 No 0 1 0 0
7041 1 1 0 4 1 1 0 0 0 0 ... Month-to-month Yes Mailed check 74.40 306.6 Yes 1 0 1 0
7042 0 0 0 66 1 0 1 0 1 1 ... Two year Yes Bank transfer (automatic) 105.65 6844.5 No 1 0 1 0

7043 rows × 22 columns

In [68]:
df.dtypes
Out[68]:
SeniorCitizen                    int64
Partner                          int32
Dependents                       int32
tenure                           int64
PhoneService                     int32
MultipleLines                    int32
OnlineSecurity                   int32
OnlineBackup                     int32
DeviceProtection                 int32
TechSupport                      int32
StreamingTV                     object
StreamingMovies                 object
Contract                        object
PaperlessBilling                object
PaymentMethod                   object
MonthlyCharges                 float64
TotalCharges                    object
Churn                           object
gender_Male                      int32
InternetService_DSL              int32
InternetService_Fiber optic      int32
InternetService_No               int32
dtype: object

12.StreamingTV¶

In [69]:
df['StreamingTV'].isnull().sum()
Out[69]:
0
In [70]:
df['StreamingTV'].unique()
Out[70]:
array(['No', 'Yes', 'No internet service'], dtype=object)
In [71]:
df.loc[df['StreamingTV'] == "No internet service", "StreamingTV"] = "No"
In [72]:
df['StreamingTV'].unique()
Out[72]:
array(['No', 'Yes'], dtype=object)
In [73]:
df['StreamingTV'] = le.fit_transform(df['StreamingTV'])
In [74]:
df['StreamingTV']
Out[74]:
0       0
1       0
2       0
3       0
4       0
       ..
7038    1
7039    1
7040    0
7041    0
7042    1
Name: StreamingTV, Length: 7043, dtype: int32

13. StreamingMovies¶

In [75]:
df['StreamingMovies'].isnull().sum()
Out[75]:
0
In [76]:
df['StreamingMovies'].unique()
Out[76]:
array(['No', 'Yes', 'No internet service'], dtype=object)
In [77]:
df.loc[df['StreamingMovies'] == "No internet service", "StreamingMovies"] = "No"
In [78]:
df['StreamingMovies'].unique()
Out[78]:
array(['No', 'Yes'], dtype=object)
In [79]:
df['StreamingMovies'] = le.fit_transform(df['StreamingMovies'])
In [80]:
df['StreamingMovies']
Out[80]:
0       0
1       0
2       0
3       0
4       0
       ..
7038    1
7039    1
7040    0
7041    0
7042    1
Name: StreamingMovies, Length: 7043, dtype: int32

14. Contract¶

In [81]:
df['Contract'].isnull().sum()
Out[81]:
0
In [82]:
df['Contract'].unique()
Out[82]:
array(['Month-to-month', 'One year', 'Two year'], dtype=object)

We've to perform one hot encoding here...

In [83]:
contract = pd.get_dummies(df['Contract'], prefix = 'Contract', dtype = 'int' )
In [84]:
contract
Out[84]:
Contract_Month-to-month Contract_One year Contract_Two year
0 1 0 0
1 0 1 0
2 1 0 0
3 0 1 0
4 1 0 0
... ... ... ...
7038 0 1 0
7039 0 1 0
7040 1 0 0
7041 1 0 0
7042 0 0 1

7043 rows × 3 columns

In [85]:
df = df.drop(['Contract'], axis = 1)
In [86]:
df = pd.concat((df, contract), axis = 1)
In [87]:
df
Out[87]:
SeniorCitizen Partner Dependents tenure PhoneService MultipleLines OnlineSecurity OnlineBackup DeviceProtection TechSupport ... MonthlyCharges TotalCharges Churn gender_Male InternetService_DSL InternetService_Fiber optic InternetService_No Contract_Month-to-month Contract_One year Contract_Two year
0 0 1 0 1 0 0 0 1 0 0 ... 29.85 29.85 No 0 1 0 0 1 0 0
1 0 0 0 34 1 0 1 0 1 0 ... 56.95 1889.5 No 1 1 0 0 0 1 0
2 0 0 0 2 1 0 1 1 0 0 ... 53.85 108.15 Yes 1 1 0 0 1 0 0
3 0 0 0 45 0 0 1 0 1 1 ... 42.30 1840.75 No 1 1 0 0 0 1 0
4 0 0 0 2 1 0 0 0 0 0 ... 70.70 151.65 Yes 0 0 1 0 1 0 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
7038 0 1 1 24 1 1 1 0 1 1 ... 84.80 1990.5 No 1 1 0 0 0 1 0
7039 0 1 1 72 1 1 0 1 1 0 ... 103.20 7362.9 No 0 0 1 0 0 1 0
7040 0 1 1 11 0 0 1 0 0 0 ... 29.60 346.45 No 0 1 0 0 1 0 0
7041 1 1 0 4 1 1 0 0 0 0 ... 74.40 306.6 Yes 1 0 1 0 1 0 0
7042 0 0 0 66 1 0 1 0 1 1 ... 105.65 6844.5 No 1 0 1 0 0 0 1

7043 rows × 24 columns

15. PaperlessBilling¶

In [88]:
df['PaperlessBilling'].isnull().sum()
Out[88]:
0
In [89]:
df['PaperlessBilling'].unique()
Out[89]:
array(['Yes', 'No'], dtype=object)
In [90]:
df['PaperlessBilling'] = le.fit_transform(df['PaperlessBilling'])
In [91]:
df['PaperlessBilling']
Out[91]:
0       1
1       0
2       1
3       0
4       1
       ..
7038    1
7039    1
7040    1
7041    1
7042    1
Name: PaperlessBilling, Length: 7043, dtype: int32

16. PaymentMethod¶

In [92]:
df['PaymentMethod'].isnull().sum()
Out[92]:
0
In [93]:
df['PaymentMethod'].unique()
Out[93]:
array(['Electronic check', 'Mailed check', 'Bank transfer (automatic)',
       'Credit card (automatic)'], dtype=object)
In [94]:
PaymentMethod = pd.get_dummies(df['PaymentMethod'], prefix = 'PaymentMethod', dtype = 'int')
In [95]:
PaymentMethod
Out[95]:
PaymentMethod_Bank transfer (automatic) PaymentMethod_Credit card (automatic) PaymentMethod_Electronic check PaymentMethod_Mailed check
0 0 0 1 0
1 0 0 0 1
2 0 0 0 1
3 1 0 0 0
4 0 0 1 0
... ... ... ... ...
7038 0 0 0 1
7039 0 1 0 0
7040 0 0 1 0
7041 0 0 0 1
7042 1 0 0 0

7043 rows × 4 columns

In [96]:
df = df.drop(['PaymentMethod'], axis = 1)
In [97]:
df = pd.concat((df, PaymentMethod), axis = 1)
In [98]:
df
Out[98]:
SeniorCitizen Partner Dependents tenure PhoneService MultipleLines OnlineSecurity OnlineBackup DeviceProtection TechSupport ... InternetService_DSL InternetService_Fiber optic InternetService_No Contract_Month-to-month Contract_One year Contract_Two year PaymentMethod_Bank transfer (automatic) PaymentMethod_Credit card (automatic) PaymentMethod_Electronic check PaymentMethod_Mailed check
0 0 1 0 1 0 0 0 1 0 0 ... 1 0 0 1 0 0 0 0 1 0
1 0 0 0 34 1 0 1 0 1 0 ... 1 0 0 0 1 0 0 0 0 1
2 0 0 0 2 1 0 1 1 0 0 ... 1 0 0 1 0 0 0 0 0 1
3 0 0 0 45 0 0 1 0 1 1 ... 1 0 0 0 1 0 1 0 0 0
4 0 0 0 2 1 0 0 0 0 0 ... 0 1 0 1 0 0 0 0 1 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
7038 0 1 1 24 1 1 1 0 1 1 ... 1 0 0 0 1 0 0 0 0 1
7039 0 1 1 72 1 1 0 1 1 0 ... 0 1 0 0 1 0 0 1 0 0
7040 0 1 1 11 0 0 1 0 0 0 ... 1 0 0 1 0 0 0 0 1 0
7041 1 1 0 4 1 1 0 0 0 0 ... 0 1 0 1 0 0 0 0 0 1
7042 0 0 0 66 1 0 1 0 1 1 ... 0 1 0 0 0 1 1 0 0 0

7043 rows × 27 columns

In [99]:
df.dtypes
Out[99]:
SeniorCitizen                                int64
Partner                                      int32
Dependents                                   int32
tenure                                       int64
PhoneService                                 int32
MultipleLines                                int32
OnlineSecurity                               int32
OnlineBackup                                 int32
DeviceProtection                             int32
TechSupport                                  int32
StreamingTV                                  int32
StreamingMovies                              int32
PaperlessBilling                             int32
MonthlyCharges                             float64
TotalCharges                                object
Churn                                       object
gender_Male                                  int32
InternetService_DSL                          int32
InternetService_Fiber optic                  int32
InternetService_No                           int32
Contract_Month-to-month                      int32
Contract_One year                            int32
Contract_Two year                            int32
PaymentMethod_Bank transfer (automatic)      int32
PaymentMethod_Credit card (automatic)        int32
PaymentMethod_Electronic check               int32
PaymentMethod_Mailed check                   int32
dtype: object

17. TotalCharges¶

In [100]:
df['TotalCharges'].isnull().sum()
Out[100]:
0
In [101]:
df['TotalCharges']
Out[101]:
0         29.85
1        1889.5
2        108.15
3       1840.75
4        151.65
         ...   
7038     1990.5
7039     7362.9
7040     346.45
7041      306.6
7042     6844.5
Name: TotalCharges, Length: 7043, dtype: object

From the above output, we see that all are numbers(float). But, dtype tells it is an object. So, we have to convert from object type to float.

In [102]:
 #df['TotalCharges'] = pd.to_numeric(df['TotalCharges'])

Error raises at 488 Index value.. says that failed to convert empty string as numeric... We have to handle this problem..¶

In [103]:
df['TotalCharges'][488]
Out[103]:
' '
In [104]:
df.iloc[488]
Out[104]:
SeniorCitizen                                  0
Partner                                        1
Dependents                                     1
tenure                                         0
PhoneService                                   0
MultipleLines                                  0
OnlineSecurity                                 1
OnlineBackup                                   0
DeviceProtection                               1
TechSupport                                    1
StreamingTV                                    1
StreamingMovies                                0
PaperlessBilling                               1
MonthlyCharges                             52.55
TotalCharges                                    
Churn                                         No
gender_Male                                    0
InternetService_DSL                            1
InternetService_Fiber optic                    0
InternetService_No                             0
Contract_Month-to-month                        0
Contract_One year                              0
Contract_Two year                              1
PaymentMethod_Bank transfer (automatic)        1
PaymentMethod_Credit card (automatic)          0
PaymentMethod_Electronic check                 0
PaymentMethod_Mailed check                     0
Name: 488, dtype: object

From the above information we observed that TotalCharges are not updated for this record... Let's check for all records when tenure == 0.¶

In [105]:
pd.set_option('display.max_columns', None)
In [106]:
df.where(df['tenure'] == 0).dropna()
Out[106]:
SeniorCitizen Partner Dependents tenure PhoneService MultipleLines OnlineSecurity OnlineBackup DeviceProtection TechSupport StreamingTV StreamingMovies PaperlessBilling MonthlyCharges TotalCharges Churn gender_Male InternetService_DSL InternetService_Fiber optic InternetService_No Contract_Month-to-month Contract_One year Contract_Two year PaymentMethod_Bank transfer (automatic) PaymentMethod_Credit card (automatic) PaymentMethod_Electronic check PaymentMethod_Mailed check
488 0.0 1.0 1.0 0.0 0.0 0.0 1.0 0.0 1.0 1.0 1.0 0.0 1.0 52.55 No 0.0 1.0 0.0 0.0 0.0 0.0 1.0 1.0 0.0 0.0 0.0
753 0.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 20.25 No 1.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0
936 0.0 1.0 1.0 0.0 1.0 0.0 1.0 1.0 1.0 0.0 1.0 1.0 0.0 80.85 No 0.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0
1082 0.0 1.0 1.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 25.75 No 1.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0
1340 0.0 1.0 1.0 0.0 0.0 0.0 1.0 1.0 1.0 1.0 1.0 0.0 0.0 56.05 No 0.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 0.0
3331 0.0 1.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 19.85 No 1.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0
3826 0.0 1.0 1.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 25.35 No 1.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0
4380 0.0 1.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 20.00 No 0.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0
5218 0.0 1.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 19.70 No 1.0 0.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0
6670 0.0 1.0 1.0 0.0 1.0 1.0 0.0 1.0 1.0 1.0 1.0 0.0 0.0 73.35 No 0.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0
6754 0.0 0.0 1.0 0.0 1.0 1.0 1.0 1.0 0.0 1.0 0.0 0.0 1.0 61.90 No 1.0 1.0 0.0 0.0 0.0 0.0 1.0 1.0 0.0 0.0 0.0

Total Charges are not updated for all customers when tenure == 0. We can't update it with 0. There is a monthly charge column says that the customer have to pay some value for this month. So, we update it with that montly charges value...¶

In [107]:
to_change = df.where(df['TotalCharges'] == ' ').dropna()
In [108]:
to_change.index
Out[108]:
Index([488, 753, 936, 1082, 1340, 3331, 3826, 4380, 5218, 6670, 6754], dtype='int64')
In [109]:
for x in to_change.index:
    df.loc[df['TotalCharges'] == " ", "TotalCharges"] = df.iloc[x]['MonthlyCharges']
In [110]:
df.where(df['tenure'] == 0).dropna()
Out[110]:
SeniorCitizen Partner Dependents tenure PhoneService MultipleLines OnlineSecurity OnlineBackup DeviceProtection TechSupport StreamingTV StreamingMovies PaperlessBilling MonthlyCharges TotalCharges Churn gender_Male InternetService_DSL InternetService_Fiber optic InternetService_No Contract_Month-to-month Contract_One year Contract_Two year PaymentMethod_Bank transfer (automatic) PaymentMethod_Credit card (automatic) PaymentMethod_Electronic check PaymentMethod_Mailed check
488 0.0 1.0 1.0 0.0 0.0 0.0 1.0 0.0 1.0 1.0 1.0 0.0 1.0 52.55 52.55 No 0.0 1.0 0.0 0.0 0.0 0.0 1.0 1.0 0.0 0.0 0.0
753 0.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 20.25 52.55 No 1.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0
936 0.0 1.0 1.0 0.0 1.0 0.0 1.0 1.0 1.0 0.0 1.0 1.0 0.0 80.85 52.55 No 0.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0
1082 0.0 1.0 1.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 25.75 52.55 No 1.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0
1340 0.0 1.0 1.0 0.0 0.0 0.0 1.0 1.0 1.0 1.0 1.0 0.0 0.0 56.05 52.55 No 0.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 0.0
3331 0.0 1.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 19.85 52.55 No 1.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0
3826 0.0 1.0 1.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 25.35 52.55 No 1.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0
4380 0.0 1.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 20.00 52.55 No 0.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0
5218 0.0 1.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 19.70 52.55 No 1.0 0.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0
6670 0.0 1.0 1.0 0.0 1.0 1.0 0.0 1.0 1.0 1.0 1.0 0.0 0.0 73.35 52.55 No 0.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0
6754 0.0 0.0 1.0 0.0 1.0 1.0 1.0 1.0 0.0 1.0 0.0 0.0 1.0 61.90 52.55 No 1.0 1.0 0.0 0.0 0.0 0.0 1.0 1.0 0.0 0.0 0.0
In [111]:
df['TotalCharges'] = pd.to_numeric(df['TotalCharges'])
In [112]:
df.dtypes
Out[112]:
SeniorCitizen                                int64
Partner                                      int32
Dependents                                   int32
tenure                                       int64
PhoneService                                 int32
MultipleLines                                int32
OnlineSecurity                               int32
OnlineBackup                                 int32
DeviceProtection                             int32
TechSupport                                  int32
StreamingTV                                  int32
StreamingMovies                              int32
PaperlessBilling                             int32
MonthlyCharges                             float64
TotalCharges                               float64
Churn                                       object
gender_Male                                  int32
InternetService_DSL                          int32
InternetService_Fiber optic                  int32
InternetService_No                           int32
Contract_Month-to-month                      int32
Contract_One year                            int32
Contract_Two year                            int32
PaymentMethod_Bank transfer (automatic)      int32
PaymentMethod_Credit card (automatic)        int32
PaymentMethod_Electronic check               int32
PaymentMethod_Mailed check                   int32
dtype: object

18. Churn¶

In [113]:
df['Churn'].isnull().sum()
Out[113]:
0
In [114]:
df['Churn'].unique()
Out[114]:
array(['No', 'Yes'], dtype=object)
In [115]:
df['Churn'] = le.fit_transform(df['Churn'])
In [116]:
df
Out[116]:
SeniorCitizen Partner Dependents tenure PhoneService MultipleLines OnlineSecurity OnlineBackup DeviceProtection TechSupport StreamingTV StreamingMovies PaperlessBilling MonthlyCharges TotalCharges Churn gender_Male InternetService_DSL InternetService_Fiber optic InternetService_No Contract_Month-to-month Contract_One year Contract_Two year PaymentMethod_Bank transfer (automatic) PaymentMethod_Credit card (automatic) PaymentMethod_Electronic check PaymentMethod_Mailed check
0 0 1 0 1 0 0 0 1 0 0 0 0 1 29.85 29.85 0 0 1 0 0 1 0 0 0 0 1 0
1 0 0 0 34 1 0 1 0 1 0 0 0 0 56.95 1889.50 0 1 1 0 0 0 1 0 0 0 0 1
2 0 0 0 2 1 0 1 1 0 0 0 0 1 53.85 108.15 1 1 1 0 0 1 0 0 0 0 0 1
3 0 0 0 45 0 0 1 0 1 1 0 0 0 42.30 1840.75 0 1 1 0 0 0 1 0 1 0 0 0
4 0 0 0 2 1 0 0 0 0 0 0 0 1 70.70 151.65 1 0 0 1 0 1 0 0 0 0 1 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
7038 0 1 1 24 1 1 1 0 1 1 1 1 1 84.80 1990.50 0 1 1 0 0 0 1 0 0 0 0 1
7039 0 1 1 72 1 1 0 1 1 0 1 1 1 103.20 7362.90 0 0 0 1 0 0 1 0 0 1 0 0
7040 0 1 1 11 0 0 1 0 0 0 0 0 1 29.60 346.45 0 0 1 0 0 1 0 0 0 0 1 0
7041 1 1 0 4 1 1 0 0 0 0 0 0 1 74.40 306.60 1 1 0 1 0 1 0 0 0 0 0 1
7042 0 0 0 66 1 0 1 0 1 1 1 1 1 105.65 6844.50 0 1 0 1 0 0 0 1 1 0 0 0

7043 rows × 27 columns

In [117]:
df.dtypes
Out[117]:
SeniorCitizen                                int64
Partner                                      int32
Dependents                                   int32
tenure                                       int64
PhoneService                                 int32
MultipleLines                                int32
OnlineSecurity                               int32
OnlineBackup                                 int32
DeviceProtection                             int32
TechSupport                                  int32
StreamingTV                                  int32
StreamingMovies                              int32
PaperlessBilling                             int32
MonthlyCharges                             float64
TotalCharges                               float64
Churn                                        int32
gender_Male                                  int32
InternetService_DSL                          int32
InternetService_Fiber optic                  int32
InternetService_No                           int32
Contract_Month-to-month                      int32
Contract_One year                            int32
Contract_Two year                            int32
PaymentMethod_Bank transfer (automatic)      int32
PaymentMethod_Credit card (automatic)        int32
PaymentMethod_Electronic check               int32
PaymentMethod_Mailed check                   int32
dtype: object

Exploratory Data Aalysis¶

First we check the balance of the dataset based on the Dependent feature...¶

In [118]:
fig = plx.histogram(df, x = 'Churn', title = "Churn", color = 'Churn')
fig.update_traces(dict(marker_line_width=0))
fig.show()

This histogram tells clearly that the dataset is imbalanced...¶

Let's perform all tasks before balancing the dataset and then check performance after balancing the dataset..¶

In [119]:
temp_df = pd.read_csv('WA_Fn-UseC_-Telco-Customer-Churn.csv')
In [120]:
temp_df.dtypes
Out[120]:
customerID           object
gender               object
SeniorCitizen         int64
Partner              object
Dependents           object
tenure                int64
PhoneService         object
MultipleLines        object
InternetService      object
OnlineSecurity       object
OnlineBackup         object
DeviceProtection     object
TechSupport          object
StreamingTV          object
StreamingMovies      object
Contract             object
PaperlessBilling     object
PaymentMethod        object
MonthlyCharges      float64
TotalCharges         object
Churn                object
dtype: object
In [121]:
import plotly.express as plx
In [122]:
fig = plx.bar(temp_df, x = 'gender', color = 'Churn')
fig.update_traces(dict(marker_line_width=0))
fig.show()

From the above bar chart, we confirm that same amount of churn occurs in both the genders¶

In [123]:
fig = plx.histogram(temp_df, x = 'SeniorCitizen', color = 'Churn')
fig.update_traces(dict(marker_line_width=0))
fig.show()

Out of 1142 SeniorCitizen 476 were churn¶

Out of 5901 Non Seniorcitizen, only 1393 were churn¶

The churn precentage are,¶

SeniorCitizen - 41.68%¶

Non-SeniorCitizen - 23.6%¶

Nearly half of the SeniorCitizen were churn.¶

In [124]:
fig = plx.histogram(temp_df, x = 'Partner', color = 'Churn')
fig.update_traces(dict(marker_line_width=0))
fig.show()

Out of 3402 partners, 669 i.e., 19.66% were churn.¶

Out of 3641 non-partners, 1200 i.e., 32.95% were churn¶

In [125]:
fig = plx.histogram(temp_df, x = 'Dependents', color = 'Churn')
fig.update_traces(dict(marker_line_width=0))
fig.show()

Out of 2110 dependents, only 326 i.e., 15.45% were churn¶

Out of 4933 non-dependents, 1543 i.e., 31.279% were churn¶

In [126]:
fig = plx.scatter(temp_df, y = 'tenure', color = 'Churn')
fig.update_traces(dict(marker_line_width=0))
fig.show()

This scatter plot is confusing a lot. So, we're going to separe the tenure value based on churn value.

In [127]:
tenure_churn_yes, tenure_churn_no = [], []
In [128]:
for i in range(len(df)):
    tenure = temp_df.iloc[i]['tenure']
    churn = temp_df.iloc[i]['Churn']
    if churn == 'Yes':
        tenure_churn_yes.append(tenure)
    else:
        tenure_churn_no.append(tenure)
In [129]:
fig = plx.scatter(y = tenure_churn_yes, title = 'Churn with tenure')
fig.update_traces(dict(marker_line_width=0))
fig.show()

Churn is high when tenure is low, i.e., Due to some reasons the customers goes away in early stages. Tenure is nothing but the period of time that he/she stays in that business/organization.¶

In [130]:
fig = plx.scatter(y = tenure_churn_no, title = 'No Churn with Tenure')
fig.update_traces(dict(marker_line_width=0))
fig.show()

From the above scatter plot, we can't observe anything because it was spreaded evenly according to the tenure.¶

In [131]:
fig = plx.histogram(temp_df, x = 'PhoneService', color = 'Churn')
fig.update_traces(dict(marker_line_width=0))
fig.show()

From the above histogram,¶

Out of 682 Citizen who not having PhoneService, 170 i.e., 24.92% were churn...¶

out of 6361 Citizen having PhoneService, 1699 i.e., 26.70 were churn...¶

In [132]:
temp_df.dtypes
Out[132]:
customerID           object
gender               object
SeniorCitizen         int64
Partner              object
Dependents           object
tenure                int64
PhoneService         object
MultipleLines        object
InternetService      object
OnlineSecurity       object
OnlineBackup         object
DeviceProtection     object
TechSupport          object
StreamingTV          object
StreamingMovies      object
Contract             object
PaperlessBilling     object
PaymentMethod        object
MonthlyCharges      float64
TotalCharges         object
Churn                object
dtype: object
In [133]:
temp_df['MultipleLines'].unique()
Out[133]:
array(['No phone service', 'No', 'Yes'], dtype=object)
In [134]:
temp_df.loc[temp_df['MultipleLines'] == 'No phone service', 'MultipleLines'] = 'No'
In [135]:
temp_df['MultipleLines'].unique()
Out[135]:
array(['No', 'Yes'], dtype=object)
In [136]:
fig = plx.histogram(temp_df, x = 'MultipleLines', color = 'Churn')
fig.update_traces(dict(marker_line_width=0))
fig.show()

From the above histogram, we see that around 25% of citizens were churn in both the scenarios i.e., citizen having mutiplelines facility and citizen not having multiple line facility.¶

In [137]:
temp_df['InternetService'].unique()
Out[137]:
array(['DSL', 'Fiber optic', 'No'], dtype=object)
In [138]:
fig = plx.histogram(temp_df, x = 'InternetService', color = 'Churn')
fig.update_traces(dict(marker_line_width=0))
fig.show()

From the above histogram, we clearly notice that nearly 42% of the FiberOptic Customers were churn. We really don't know the reason behind it.. May be poot Internet Facility¶

In [139]:
temp_df['OnlineSecurity'].unique()
Out[139]:
array(['No', 'Yes', 'No internet service'], dtype=object)
In [140]:
temp_df.loc[temp_df['OnlineSecurity'] == 'No internet service', 'OnlineSecurity'] = 'No'
In [141]:
temp_df['OnlineSecurity'].unique()
Out[141]:
array(['No', 'Yes'], dtype=object)
In [142]:
fig = plx.histogram(temp_df, x = 'OnlineSecurity', color = 'Churn')
fig.update_traces(dict(marker_line_width=0))
fig.show()

31.3% of the customer with No OnlineSecurity were churn. Churn percentage is less in the customers who having OnlineSecurity...¶

Let's check Churn on Total charges and Monthly Charges..¶

In [143]:
fig = plx.scatter(temp_df, x = 'MonthlyCharges', color = 'Churn')
fig.show()
In [144]:
fig = plx.scatter(temp_df, x = 'TotalCharges', color = 'Churn')
fig.show()

From the above two visualization, we confirm that there is no relationship between Totalcharges/Monthlycharges and Churn...¶

In [145]:
plx.imshow(df.corr(), text_auto = True, height = 1750, width = 1750)

From the above heatmap, we know that there is no highly correlated independent feature with the dependent feature.¶

Some of the features having descent correlation i.e.,¶

Contract_month_to_month - 0.4¶

Contract_Two_years - (-0.3)¶

Payment_method_Electronic_check - 0.3¶

InternetService_Fiber Optic - 0.3¶

In [146]:
temp_df['Churn'].unique()
Out[146]:
array(['No', 'Yes'], dtype=object)
In [147]:
churn_yes_df = temp_df.where((temp_df['Churn'] == 'Yes')).dropna()
churn_yes_df
Out[147]:
customerID gender SeniorCitizen Partner Dependents tenure PhoneService MultipleLines InternetService OnlineSecurity OnlineBackup DeviceProtection TechSupport StreamingTV StreamingMovies Contract PaperlessBilling PaymentMethod MonthlyCharges TotalCharges Churn
2 3668-QPYBK Male 0.0 No No 2.0 Yes No DSL Yes Yes No No No No Month-to-month Yes Mailed check 53.85 108.15 Yes
4 9237-HQITU Female 0.0 No No 2.0 Yes No Fiber optic No No No No No No Month-to-month Yes Electronic check 70.70 151.65 Yes
5 9305-CDSKC Female 0.0 No No 8.0 Yes Yes Fiber optic No No Yes No Yes Yes Month-to-month Yes Electronic check 99.65 820.5 Yes
8 7892-POOKP Female 0.0 Yes No 28.0 Yes Yes Fiber optic No No Yes Yes Yes Yes Month-to-month Yes Electronic check 104.80 3046.05 Yes
13 0280-XJGEX Male 0.0 No No 49.0 Yes Yes Fiber optic No Yes Yes No Yes Yes Month-to-month Yes Bank transfer (automatic) 103.70 5036.3 Yes
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
7021 1699-HPSBG Male 0.0 No No 12.0 Yes No DSL No No No Yes Yes No One year Yes Electronic check 59.80 727.8 Yes
7026 8775-CEBBJ Female 0.0 No No 9.0 Yes No DSL No No No No No No Month-to-month Yes Bank transfer (automatic) 44.20 403.35 Yes
7032 6894-LFHLY Male 1.0 No No 1.0 Yes Yes Fiber optic No No No No No No Month-to-month Yes Electronic check 75.75 75.75 Yes
7034 0639-TSIQW Female 0.0 No No 67.0 Yes Yes Fiber optic Yes Yes Yes No Yes No Month-to-month Yes Credit card (automatic) 102.95 6886.25 Yes
7041 8361-LTMKD Male 1.0 Yes No 4.0 Yes Yes Fiber optic No No No No No No Month-to-month Yes Mailed check 74.40 306.6 Yes

1869 rows × 21 columns

Thereare 1869 churn customers...

In [148]:
churn_yes_df['Dependents'].value_counts()
Out[148]:
Dependents
No     1543
Yes     326
Name: count, dtype: int64
In [149]:
fig = plx.histogram(churn_yes_df, x = 'Dependents', title = "Churn on Dependents")
fig.update_traces(dict(marker_line_width=0))
fig.show()

From the above bar chart, most of churn are non dependents...¶

In [150]:
churn_yes_df['tenure'].value_counts()
Out[150]:
tenure
1.0     380
2.0     123
3.0      94
4.0      83
5.0      64
       ... 
60.0      6
72.0      6
62.0      5
64.0      4
63.0      4
Name: count, Length: 72, dtype: int64
In [151]:
plx.histogram(x = churn_yes_df['tenure'])

This Histogram says that the customers were moving out in early stages...¶

In [152]:
churn_yes_df['PhoneService'].value_counts()
Out[152]:
PhoneService
Yes    1699
No      170
Name: count, dtype: int64
In [153]:
fig = plx.histogram(churn_yes_df, x = 'PhoneService', title = "Churn on PhoneService")
fig.update_traces(dict(marker_line_width=0))
fig.show()

Customers are moving out eventhough having PhoneService. Something Fishy!!!.........¶

In [154]:
churn_yes_df['InternetService'].value_counts()
Out[154]:
InternetService
Fiber optic    1297
DSL             459
No              113
Name: count, dtype: int64
In [155]:
fig = plx.histogram(churn_yes_df, x = 'InternetService', title = "Churn on InternetService")
fig.update_traces(dict(marker_line_width=0))
fig.show()

Customers with Fiber Optic connection were moving out.. This may be of poor service in Fiber Optics InternetService¶

In [156]:
churn_yes_df['Contract'].value_counts()
Out[156]:
Contract
Month-to-month    1655
One year           166
Two year            48
Name: count, dtype: int64
In [157]:
fig = plx.histogram(churn_yes_df, x = 'Contract', title = "Churn on Contract")
fig.update_traces(dict(marker_line_width=0))
fig.show()

Out of 1869 churns, 1655 were Month-to-Month contract customers...¶

In [158]:
churn_yes_df['PaymentMethod'].value_counts()
Out[158]:
PaymentMethod
Electronic check             1071
Mailed check                  308
Bank transfer (automatic)     258
Credit card (automatic)       232
Name: count, dtype: int64
In [159]:
fig = plx.histogram(churn_yes_df, x = 'PaymentMethod', title = "PaymentMethod")
fig.update_traces(dict(marker_line_width=0))
fig.show()

Out of 1869 churns, 1071 were using Payment method as Electronic Check...¶

Feature Scaling¶

In [160]:
df
Out[160]:
SeniorCitizen Partner Dependents tenure PhoneService MultipleLines OnlineSecurity OnlineBackup DeviceProtection TechSupport StreamingTV StreamingMovies PaperlessBilling MonthlyCharges TotalCharges Churn gender_Male InternetService_DSL InternetService_Fiber optic InternetService_No Contract_Month-to-month Contract_One year Contract_Two year PaymentMethod_Bank transfer (automatic) PaymentMethod_Credit card (automatic) PaymentMethod_Electronic check PaymentMethod_Mailed check
0 0 1 0 1 0 0 0 1 0 0 0 0 1 29.85 29.85 0 0 1 0 0 1 0 0 0 0 1 0
1 0 0 0 34 1 0 1 0 1 0 0 0 0 56.95 1889.50 0 1 1 0 0 0 1 0 0 0 0 1
2 0 0 0 2 1 0 1 1 0 0 0 0 1 53.85 108.15 1 1 1 0 0 1 0 0 0 0 0 1
3 0 0 0 45 0 0 1 0 1 1 0 0 0 42.30 1840.75 0 1 1 0 0 0 1 0 1 0 0 0
4 0 0 0 2 1 0 0 0 0 0 0 0 1 70.70 151.65 1 0 0 1 0 1 0 0 0 0 1 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
7038 0 1 1 24 1 1 1 0 1 1 1 1 1 84.80 1990.50 0 1 1 0 0 0 1 0 0 0 0 1
7039 0 1 1 72 1 1 0 1 1 0 1 1 1 103.20 7362.90 0 0 0 1 0 0 1 0 0 1 0 0
7040 0 1 1 11 0 0 1 0 0 0 0 0 1 29.60 346.45 0 0 1 0 0 1 0 0 0 0 1 0
7041 1 1 0 4 1 1 0 0 0 0 0 0 1 74.40 306.60 1 1 0 1 0 1 0 0 0 0 0 1
7042 0 0 0 66 1 0 1 0 1 1 1 1 1 105.65 6844.50 0 1 0 1 0 0 0 1 1 0 0 0

7043 rows × 27 columns

Let's perform a scaling in the DataFrame to bring all the features in the same scale...¶

In [161]:
from sklearn.preprocessing import MinMaxScaler
In [162]:
mms = MinMaxScaler()
In [163]:
df_new = mms.fit_transform(df)
In [164]:
df = pd.DataFrame(df_new, columns = df.columns)
In [165]:
df
Out[165]:
SeniorCitizen Partner Dependents tenure PhoneService MultipleLines OnlineSecurity OnlineBackup DeviceProtection TechSupport StreamingTV StreamingMovies PaperlessBilling MonthlyCharges TotalCharges Churn gender_Male InternetService_DSL InternetService_Fiber optic InternetService_No Contract_Month-to-month Contract_One year Contract_Two year PaymentMethod_Bank transfer (automatic) PaymentMethod_Credit card (automatic) PaymentMethod_Electronic check PaymentMethod_Mailed check
0 0.0 1.0 0.0 0.013889 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 0.115423 0.001275 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0
1 0.0 0.0 0.0 0.472222 1.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.385075 0.215867 0.0 1.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0
2 0.0 0.0 0.0 0.027778 1.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 1.0 0.354229 0.010310 1.0 1.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0
3 0.0 0.0 0.0 0.625000 0.0 0.0 1.0 0.0 1.0 1.0 0.0 0.0 0.0 0.239303 0.210241 0.0 1.0 1.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0
4 0.0 0.0 0.0 0.027778 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.521891 0.015330 1.0 0.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
7038 0.0 1.0 1.0 0.333333 1.0 1.0 1.0 0.0 1.0 1.0 1.0 1.0 1.0 0.662189 0.227521 0.0 1.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0
7039 0.0 1.0 1.0 1.000000 1.0 1.0 0.0 1.0 1.0 0.0 1.0 1.0 1.0 0.845274 0.847461 0.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0
7040 0.0 1.0 1.0 0.152778 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.112935 0.037809 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0
7041 1.0 1.0 0.0 0.055556 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.558706 0.033210 1.0 1.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0
7042 0.0 0.0 0.0 0.916667 1.0 0.0 1.0 0.0 1.0 1.0 1.0 1.0 1.0 0.869652 0.787641 0.0 1.0 0.0 1.0 0.0 0.0 0.0 1.0 1.0 0.0 0.0 0.0

7043 rows × 27 columns

Now, everything sacled in the range of 0 to 1.

Creating a Model¶

Model 1¶

In [166]:
from keras.layers import Dense,Dropout
from keras import Sequential
In [167]:
ANN_model = Sequential()
In [168]:
# Adding Input Layer to ANN
ANN_model.add(Dense(units = 27, activation = 'relu'))
In [169]:
# Adding 1st Hidden Layer to the ANN
ANN_model.add(Dense(units = 15, activation = 'relu'))
ANN_model.add(Dropout(0.4))
In [170]:
# Adding 2nd Hidden Layer to the ANN
ANN_model.add(Dense(units = 7, activation = 'relu'))
ANN_model.add(Dropout(0.3))
In [171]:
# Adding Output Layer to the ANN
ANN_model.add(Dense(units = 1, activation = 'sigmoid'))
In [172]:
ANN_model.compile(optimizer = 'adam',
                  loss = 'binary_crossentropy',
                  metrics = ['accuracy'])
In [173]:
x_train, x_test, y_train, y_test = train_test_split( df.drop(['Churn'], axis = 1), df['Churn'], test_size = 0.2, random_state = 35)
In [174]:
import tensorflow as tf
early_stopping = tf.keras.callbacks.EarlyStopping(
    monitor="accuracy",
    min_delta=0.0001,
    patience=20,
    verbose=1,
    mode="auto",
    baseline=None,
    restore_best_weights=True
)
In [175]:
model_history = ANN_model.fit(x_train, y_train, batch_size = 2, epochs = 150, validation_data = (x_test,y_test), callbacks = early_stopping )
Epoch 1/150
2817/2817 [==============================] - 14s 4ms/step - loss: 0.5081 - accuracy: 0.7469 - val_loss: 0.4054 - val_accuracy: 0.8070
Epoch 2/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4685 - accuracy: 0.7760 - val_loss: 0.3948 - val_accuracy: 0.8148
Epoch 3/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4550 - accuracy: 0.7812 - val_loss: 0.4082 - val_accuracy: 0.8204
Epoch 4/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4552 - accuracy: 0.7849 - val_loss: 0.3983 - val_accuracy: 0.8098
Epoch 5/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4506 - accuracy: 0.7863 - val_loss: 0.3942 - val_accuracy: 0.8211
Epoch 6/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4410 - accuracy: 0.7973 - val_loss: 0.4095 - val_accuracy: 0.8070
Epoch 7/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4435 - accuracy: 0.7868 - val_loss: 0.3960 - val_accuracy: 0.8112
Epoch 8/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4451 - accuracy: 0.7918 - val_loss: 0.3951 - val_accuracy: 0.8098
Epoch 9/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4426 - accuracy: 0.7875 - val_loss: 0.3963 - val_accuracy: 0.8176
Epoch 10/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4388 - accuracy: 0.7874 - val_loss: 0.3956 - val_accuracy: 0.8126
Epoch 11/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4392 - accuracy: 0.7948 - val_loss: 0.3959 - val_accuracy: 0.8112
Epoch 12/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4345 - accuracy: 0.7897 - val_loss: 0.3897 - val_accuracy: 0.8219
Epoch 13/150
2817/2817 [==============================] - 12s 4ms/step - loss: 0.4337 - accuracy: 0.7927 - val_loss: 0.3914 - val_accuracy: 0.8233
Epoch 14/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4370 - accuracy: 0.7923 - val_loss: 0.3917 - val_accuracy: 0.8169
Epoch 15/150
2817/2817 [==============================] - 12s 4ms/step - loss: 0.4358 - accuracy: 0.7904 - val_loss: 0.3904 - val_accuracy: 0.8247
Epoch 16/150
2817/2817 [==============================] - 12s 4ms/step - loss: 0.4303 - accuracy: 0.7980 - val_loss: 0.4021 - val_accuracy: 0.8077
Epoch 17/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4275 - accuracy: 0.7952 - val_loss: 0.3934 - val_accuracy: 0.8183
Epoch 18/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4328 - accuracy: 0.7953 - val_loss: 0.3975 - val_accuracy: 0.8098
Epoch 19/150
2817/2817 [==============================] - 12s 4ms/step - loss: 0.4275 - accuracy: 0.7948 - val_loss: 0.3975 - val_accuracy: 0.8133
Epoch 20/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4226 - accuracy: 0.7968 - val_loss: 0.3950 - val_accuracy: 0.8204
Epoch 21/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4252 - accuracy: 0.7941 - val_loss: 0.3907 - val_accuracy: 0.8233
Epoch 22/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4278 - accuracy: 0.7984 - val_loss: 0.3929 - val_accuracy: 0.8176
Epoch 23/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4219 - accuracy: 0.7985 - val_loss: 0.3975 - val_accuracy: 0.8197
Epoch 24/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4178 - accuracy: 0.7985 - val_loss: 0.3994 - val_accuracy: 0.8105
Epoch 25/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4176 - accuracy: 0.8012 - val_loss: 0.4008 - val_accuracy: 0.8041
Epoch 26/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4179 - accuracy: 0.7984 - val_loss: 0.3999 - val_accuracy: 0.8084
Epoch 27/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4179 - accuracy: 0.7991 - val_loss: 0.4014 - val_accuracy: 0.8084
Epoch 28/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4177 - accuracy: 0.8003 - val_loss: 0.4033 - val_accuracy: 0.8133
Epoch 29/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4162 - accuracy: 0.8012 - val_loss: 0.3943 - val_accuracy: 0.8155
Epoch 30/150
2817/2817 [==============================] - 12s 4ms/step - loss: 0.4114 - accuracy: 0.8028 - val_loss: 0.4029 - val_accuracy: 0.8133
Epoch 31/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4127 - accuracy: 0.7941 - val_loss: 0.4018 - val_accuracy: 0.8133
Epoch 32/150
2817/2817 [==============================] - 12s 4ms/step - loss: 0.4137 - accuracy: 0.8019 - val_loss: 0.4062 - val_accuracy: 0.8091
Epoch 33/150
2817/2817 [==============================] - 12s 4ms/step - loss: 0.4114 - accuracy: 0.8042 - val_loss: 0.4003 - val_accuracy: 0.8126
Epoch 34/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4120 - accuracy: 0.8035 - val_loss: 0.4083 - val_accuracy: 0.8105
Epoch 35/150
2817/2817 [==============================] - 12s 4ms/step - loss: 0.4114 - accuracy: 0.8035 - val_loss: 0.4066 - val_accuracy: 0.8070
Epoch 36/150
2817/2817 [==============================] - 12s 4ms/step - loss: 0.4069 - accuracy: 0.8028 - val_loss: 0.4109 - val_accuracy: 0.8077
Epoch 37/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4101 - accuracy: 0.8028 - val_loss: 0.4146 - val_accuracy: 0.8084
Epoch 38/150
2817/2817 [==============================] - 12s 4ms/step - loss: 0.4056 - accuracy: 0.7993 - val_loss: 0.4067 - val_accuracy: 0.8084
Epoch 39/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4087 - accuracy: 0.8039 - val_loss: 0.4095 - val_accuracy: 0.8091
Epoch 40/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4052 - accuracy: 0.8049 - val_loss: 0.4062 - val_accuracy: 0.8070
Epoch 41/150
2817/2817 [==============================] - 12s 4ms/step - loss: 0.4062 - accuracy: 0.8051 - val_loss: 0.4131 - val_accuracy: 0.8126
Epoch 42/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4114 - accuracy: 0.7996 - val_loss: 0.4093 - val_accuracy: 0.8105
Epoch 43/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4053 - accuracy: 0.8055 - val_loss: 0.4035 - val_accuracy: 0.8048
Epoch 44/150
2817/2817 [==============================] - 12s 4ms/step - loss: 0.4028 - accuracy: 0.8087 - val_loss: 0.4151 - val_accuracy: 0.8062
Epoch 45/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.3968 - accuracy: 0.8090 - val_loss: 0.4119 - val_accuracy: 0.8084
Epoch 46/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4073 - accuracy: 0.8069 - val_loss: 0.4066 - val_accuracy: 0.8062
Epoch 47/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4056 - accuracy: 0.8044 - val_loss: 0.4089 - val_accuracy: 0.8141
Epoch 48/150
2817/2817 [==============================] - 12s 4ms/step - loss: 0.4072 - accuracy: 0.8040 - val_loss: 0.4115 - val_accuracy: 0.8105
Epoch 49/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4028 - accuracy: 0.8076 - val_loss: 0.4109 - val_accuracy: 0.8098
Epoch 50/150
2817/2817 [==============================] - 12s 4ms/step - loss: 0.4027 - accuracy: 0.8074 - val_loss: 0.4141 - val_accuracy: 0.8119
Epoch 51/150
2817/2817 [==============================] - 12s 4ms/step - loss: 0.4036 - accuracy: 0.8072 - val_loss: 0.4157 - val_accuracy: 0.8077
Epoch 52/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4005 - accuracy: 0.8065 - val_loss: 0.4073 - val_accuracy: 0.8070
Epoch 53/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4025 - accuracy: 0.8048 - val_loss: 0.4161 - val_accuracy: 0.8077
Epoch 54/150
2817/2817 [==============================] - 12s 4ms/step - loss: 0.4024 - accuracy: 0.8042 - val_loss: 0.4077 - val_accuracy: 0.8098
Epoch 55/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4002 - accuracy: 0.8046 - val_loss: 0.4103 - val_accuracy: 0.8062
Epoch 56/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4009 - accuracy: 0.8081 - val_loss: 0.4157 - val_accuracy: 0.8119
Epoch 57/150
2817/2817 [==============================] - 12s 4ms/step - loss: 0.4012 - accuracy: 0.8081 - val_loss: 0.4056 - val_accuracy: 0.8105
Epoch 58/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.3977 - accuracy: 0.8049 - val_loss: 0.4151 - val_accuracy: 0.8126
Epoch 59/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.3979 - accuracy: 0.8012 - val_loss: 0.4065 - val_accuracy: 0.8098
Epoch 60/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4016 - accuracy: 0.8051 - val_loss: 0.4178 - val_accuracy: 0.8119
Epoch 61/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.3949 - accuracy: 0.8060 - val_loss: 0.4105 - val_accuracy: 0.8098
Epoch 62/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.3983 - accuracy: 0.8065 - val_loss: 0.4164 - val_accuracy: 0.8141
Epoch 63/150
2817/2817 [==============================] - 12s 4ms/step - loss: 0.3970 - accuracy: 0.8074 - val_loss: 0.4245 - val_accuracy: 0.8148
Epoch 64/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.3977 - accuracy: 0.8064 - val_loss: 0.4140 - val_accuracy: 0.8062
Epoch 65/150
2817/2817 [==============================] - 12s 4ms/step - loss: 0.3936 - accuracy: 0.8103 - val_loss: 0.4137 - val_accuracy: 0.8055
Epoch 66/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.3942 - accuracy: 0.8074 - val_loss: 0.4251 - val_accuracy: 0.8119
Epoch 67/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.3985 - accuracy: 0.8074 - val_loss: 0.4232 - val_accuracy: 0.8077
Epoch 68/150
2817/2817 [==============================] - 12s 4ms/step - loss: 0.3867 - accuracy: 0.8117 - val_loss: 0.4289 - val_accuracy: 0.8070
Epoch 69/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.3969 - accuracy: 0.8074 - val_loss: 0.4240 - val_accuracy: 0.8062
Epoch 70/150
2817/2817 [==============================] - 12s 4ms/step - loss: 0.3921 - accuracy: 0.8094 - val_loss: 0.4273 - val_accuracy: 0.7999
Epoch 71/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.3964 - accuracy: 0.8044 - val_loss: 0.4254 - val_accuracy: 0.8091
Epoch 72/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.3910 - accuracy: 0.8064 - val_loss: 0.4204 - val_accuracy: 0.8013
Epoch 73/150
2817/2817 [==============================] - 12s 4ms/step - loss: 0.3948 - accuracy: 0.8069 - val_loss: 0.4322 - val_accuracy: 0.8126
Epoch 74/150
2817/2817 [==============================] - 12s 4ms/step - loss: 0.3934 - accuracy: 0.8129 - val_loss: 0.4299 - val_accuracy: 0.8126
Epoch 75/150
2817/2817 [==============================] - 12s 4ms/step - loss: 0.3910 - accuracy: 0.8129 - val_loss: 0.4237 - val_accuracy: 0.8112
Epoch 76/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.3934 - accuracy: 0.8106 - val_loss: 0.4285 - val_accuracy: 0.8105
Epoch 77/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.3911 - accuracy: 0.8147 - val_loss: 0.4332 - val_accuracy: 0.8070
Epoch 78/150
2817/2817 [==============================] - 12s 4ms/step - loss: 0.3852 - accuracy: 0.8159 - val_loss: 0.4346 - val_accuracy: 0.8055
Epoch 79/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.3920 - accuracy: 0.8053 - val_loss: 0.4263 - val_accuracy: 0.7991
Epoch 80/150
2817/2817 [==============================] - 12s 4ms/step - loss: 0.3880 - accuracy: 0.8119 - val_loss: 0.4288 - val_accuracy: 0.8105
Epoch 81/150
2817/2817 [==============================] - 12s 4ms/step - loss: 0.3869 - accuracy: 0.8136 - val_loss: 0.4255 - val_accuracy: 0.8126
Epoch 82/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.3885 - accuracy: 0.8113 - val_loss: 0.4404 - val_accuracy: 0.8048
Epoch 83/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.3904 - accuracy: 0.8064 - val_loss: 0.4423 - val_accuracy: 0.8091
Epoch 84/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.3873 - accuracy: 0.8124 - val_loss: 0.4267 - val_accuracy: 0.8027
Epoch 85/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.3895 - accuracy: 0.8135 - val_loss: 0.4224 - val_accuracy: 0.8112
Epoch 86/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.3815 - accuracy: 0.8152 - val_loss: 0.4383 - val_accuracy: 0.8034
Epoch 87/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.3858 - accuracy: 0.8142 - val_loss: 0.4377 - val_accuracy: 0.8070
Epoch 88/150
2817/2817 [==============================] - 12s 4ms/step - loss: 0.3856 - accuracy: 0.8151 - val_loss: 0.4360 - val_accuracy: 0.7991
Epoch 89/150
2817/2817 [==============================] - 13s 4ms/step - loss: 0.3886 - accuracy: 0.8142 - val_loss: 0.4440 - val_accuracy: 0.8013
Epoch 90/150
2817/2817 [==============================] - 12s 4ms/step - loss: 0.3824 - accuracy: 0.8156 - val_loss: 0.4518 - val_accuracy: 0.8041
Epoch 91/150
2817/2817 [==============================] - 12s 4ms/step - loss: 0.3826 - accuracy: 0.8154 - val_loss: 0.4380 - val_accuracy: 0.8055
Epoch 92/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.3834 - accuracy: 0.8131 - val_loss: 0.4475 - val_accuracy: 0.8077
Epoch 93/150
2817/2817 [==============================] - 12s 4ms/step - loss: 0.3858 - accuracy: 0.8129 - val_loss: 0.4433 - val_accuracy: 0.8041
Epoch 94/150
2817/2817 [==============================] - 12s 4ms/step - loss: 0.3822 - accuracy: 0.8101 - val_loss: 0.4455 - val_accuracy: 0.7991
Epoch 95/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.3897 - accuracy: 0.8156 - val_loss: 0.4531 - val_accuracy: 0.8041
Epoch 96/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.3843 - accuracy: 0.8074 - val_loss: 0.4401 - val_accuracy: 0.8013
Epoch 97/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.3829 - accuracy: 0.8117 - val_loss: 0.4347 - val_accuracy: 0.8034
Epoch 98/150
2805/2817 [============================>.] - ETA: 0s - loss: 0.3791 - accuracy: 0.8127Restoring model weights from the end of the best epoch: 78.
2817/2817 [==============================] - 11s 4ms/step - loss: 0.3787 - accuracy: 0.8127 - val_loss: 0.4418 - val_accuracy: 0.8077
Epoch 98: early stopping

There is no improvement in the accuracy for longer time. And our final train accuracy of the is 81.27 %.¶

In [176]:
model_history.history['val_accuracy']
Out[176]:
[0.8069552779197693,
 0.8147622346878052,
 0.8204400539398193,
 0.8097941875457764,
 0.8211497664451599,
 0.8069552779197693,
 0.8112136125564575,
 0.8097941875457764,
 0.8176011443138123,
 0.8126330971717834,
 0.8112136125564575,
 0.8218594789505005,
 0.8232789039611816,
 0.8168914318084717,
 0.8246983885765076,
 0.8076649904251099,
 0.8183108568191528,
 0.8097941875457764,
 0.813342809677124,
 0.8204400539398193,
 0.8232789039611816,
 0.8176011443138123,
 0.819730281829834,
 0.8105039000511169,
 0.8041163682937622,
 0.8083747625350952,
 0.8083747625350952,
 0.813342809677124,
 0.8154719471931458,
 0.813342809677124,
 0.813342809677124,
 0.8090844750404358,
 0.8126330971717834,
 0.8105039000511169,
 0.8069552779197693,
 0.8076649904251099,
 0.8083747625350952,
 0.8083747625350952,
 0.8090844750404358,
 0.8069552779197693,
 0.8126330971717834,
 0.8105039000511169,
 0.8048261404037476,
 0.8062455654144287,
 0.8083747625350952,
 0.8062455654144287,
 0.8140525221824646,
 0.8105039000511169,
 0.8097941875457764,
 0.8119233250617981,
 0.8076649904251099,
 0.8069552779197693,
 0.8076649904251099,
 0.8097941875457764,
 0.8062455654144287,
 0.8119233250617981,
 0.8105039000511169,
 0.8126330971717834,
 0.8097941875457764,
 0.8119233250617981,
 0.8097941875457764,
 0.8140525221824646,
 0.8147622346878052,
 0.8062455654144287,
 0.8055358529090881,
 0.8119233250617981,
 0.8076649904251099,
 0.8069552779197693,
 0.8062455654144287,
 0.799858033657074,
 0.8090844750404358,
 0.8012775182723999,
 0.8126330971717834,
 0.8126330971717834,
 0.8112136125564575,
 0.8105039000511169,
 0.8069552779197693,
 0.8055358529090881,
 0.7991483211517334,
 0.8105039000511169,
 0.8126330971717834,
 0.8048261404037476,
 0.8090844750404358,
 0.802696943283081,
 0.8112136125564575,
 0.8034066557884216,
 0.8069552779197693,
 0.7991483211517334,
 0.8012775182723999,
 0.8041163682937622,
 0.8055358529090881,
 0.8076649904251099,
 0.8041163682937622,
 0.7991483211517334,
 0.8041163682937622,
 0.8012775182723999,
 0.8034066557884216,
 0.8076649904251099]
In [177]:
model_history.history['accuracy']
Out[177]:
[0.7468938827514648,
 0.776002824306488,
 0.7811501622200012,
 0.7848775386810303,
 0.7862975001335144,
 0.7973020672798157,
 0.786829948425293,
 0.791799783706665,
 0.7875399589538574,
 0.7873624563217163,
 0.7948172092437744,
 0.7896698713302612,
 0.7926872372627258,
 0.7923322916030884,
 0.7903798222541809,
 0.7980120778083801,
 0.7951721549034119,
 0.795349657535553,
 0.7948172092437744,
 0.7967696189880371,
 0.79410719871521,
 0.7983670830726624,
 0.7985445261001587,
 0.7985445261001587,
 0.8012069463729858,
 0.7983670830726624,
 0.799077033996582,
 0.800319492816925,
 0.8012069463729858,
 0.8028044104576111,
 0.79410719871521,
 0.8019169569015503,
 0.8042243719100952,
 0.8035143613815308,
 0.8035143613815308,
 0.8028044104576111,
 0.8028044104576111,
 0.7992545366287231,
 0.803869366645813,
 0.8049343228340149,
 0.805111825466156,
 0.7996095418930054,
 0.8054668307304382,
 0.808661699295044,
 0.8090167045593262,
 0.8068867325782776,
 0.8044018745422363,
 0.8040468692779541,
 0.807596743106842,
 0.8074192404747009,
 0.8072417378425598,
 0.8065317869186401,
 0.8047568202018738,
 0.8042243719100952,
 0.8045793175697327,
 0.8081291913986206,
 0.8081291913986206,
 0.8049343228340149,
 0.8012069463729858,
 0.805111825466156,
 0.8059992790222168,
 0.8065317869186401,
 0.8074192404747009,
 0.806354284286499,
 0.8102591633796692,
 0.8074192404747009,
 0.8074192404747009,
 0.8116790652275085,
 0.8074192404747009,
 0.8093716502189636,
 0.8044018745422363,
 0.806354284286499,
 0.8068867325782776,
 0.8129215240478516,
 0.8129215240478516,
 0.8106141090393066,
 0.8146964907646179,
 0.8159389495849609,
 0.8052893280982971,
 0.8118565678596497,
 0.813631534576416,
 0.8113241195678711,
 0.806354284286499,
 0.812389075756073,
 0.8134540319442749,
 0.8152289390563965,
 0.8141639828681946,
 0.8150514960289001,
 0.8141639828681946,
 0.8155839443206787,
 0.8154064416885376,
 0.8130990266799927,
 0.8129215240478516,
 0.8100816607475281,
 0.8155839443206787,
 0.8074192404747009,
 0.8116790652275085,
 0.8127440810203552]
In [178]:
model_history.history.keys()
Out[178]:
dict_keys(['loss', 'accuracy', 'val_loss', 'val_accuracy'])

In the variable model_history, we recorded all information of each iteration(epochs)..¶

In [179]:
import plotly.graph_objects as go
In [180]:
fig_1 = go.Figure()
In [181]:
fig_1.add_trace(go.Scatter(x =np.arange(0,len(model_history.history['accuracy'])),
                         y = model_history.history['val_accuracy'],
                         mode='lines+markers',
                         name='val_accuracy'))
fig_1.add_trace(go.Scatter(x =np.arange(0,len(model_history.history['accuracy'])),
                         y = model_history.history['accuracy'],
                         mode='lines+markers',
                         name='Accuracy'))
fig_1.update_layout(title = 'ACCURACY vs VALIDATION_ACCURACY')

fig_1.update_xaxes(title_text="Epochs")
fig_1.update_yaxes(title_text="Accuracy")

fig_1.show()

From the above line chart, we see that there is no difference in the Accuracy and Validation_Accuracy after some epochs.¶

In [182]:
model_history.history['loss']
Out[182]:
[0.5081272125244141,
 0.4685124456882477,
 0.4550357758998871,
 0.455153226852417,
 0.45062726736068726,
 0.4409547448158264,
 0.44353967905044556,
 0.4450571835041046,
 0.44258907437324524,
 0.4388164281845093,
 0.43916210532188416,
 0.4344507157802582,
 0.43371817469596863,
 0.4369526207447052,
 0.4357856810092926,
 0.4303082823753357,
 0.4274821877479553,
 0.4328174889087677,
 0.4274732768535614,
 0.4226454794406891,
 0.4251917004585266,
 0.4278009831905365,
 0.4219277799129486,
 0.417755126953125,
 0.4175887107849121,
 0.41787275671958923,
 0.41787877678871155,
 0.4177219271659851,
 0.41619035601615906,
 0.41140833497047424,
 0.41274720430374146,
 0.4136614203453064,
 0.41135138273239136,
 0.4119621217250824,
 0.41142210364341736,
 0.40689873695373535,
 0.4100908637046814,
 0.40556854009628296,
 0.40873223543167114,
 0.4051817059516907,
 0.40621814131736755,
 0.41144031286239624,
 0.4052787721157074,
 0.40283939242362976,
 0.3967612385749817,
 0.4072614312171936,
 0.40564730763435364,
 0.4072452485561371,
 0.40283632278442383,
 0.4026649296283722,
 0.4035778045654297,
 0.4005022644996643,
 0.4024786949157715,
 0.4023760259151459,
 0.40020716190338135,
 0.4008631408214569,
 0.4012056887149811,
 0.3977046012878418,
 0.39786916971206665,
 0.40157219767570496,
 0.3948601186275482,
 0.39832812547683716,
 0.3970475494861603,
 0.3976713716983795,
 0.3936421275138855,
 0.3942430019378662,
 0.39846935868263245,
 0.3866622745990753,
 0.39691194891929626,
 0.3920910060405731,
 0.3963838815689087,
 0.39102184772491455,
 0.39482757449150085,
 0.39341360330581665,
 0.3909689486026764,
 0.39342477917671204,
 0.39107751846313477,
 0.38519835472106934,
 0.39197638630867004,
 0.3880450427532196,
 0.38694512844085693,
 0.38853737711906433,
 0.3903755843639374,
 0.3873341381549835,
 0.3895360231399536,
 0.3814878463745117,
 0.3858412504196167,
 0.3856278657913208,
 0.3886488080024719,
 0.3823695480823517,
 0.3826170563697815,
 0.3833910822868347,
 0.38578200340270996,
 0.38219648599624634,
 0.3896782100200653,
 0.3843458890914917,
 0.3829021453857422,
 0.37872254848480225]
In [183]:
model_history.history['val_loss']
Out[183]:
[0.40535616874694824,
 0.3948274254798889,
 0.4082051217556,
 0.39829161763191223,
 0.39419999718666077,
 0.40953198075294495,
 0.39598217606544495,
 0.3951069116592407,
 0.39632049202919006,
 0.39559921622276306,
 0.395935982465744,
 0.38972020149230957,
 0.39144089818000793,
 0.3916804790496826,
 0.3903910517692566,
 0.4021134674549103,
 0.3934386074542999,
 0.3974713683128357,
 0.39747804403305054,
 0.39503487944602966,
 0.39070501923561096,
 0.3929044008255005,
 0.397475928068161,
 0.39935338497161865,
 0.4008231461048126,
 0.3999277353286743,
 0.40143662691116333,
 0.4033142924308777,
 0.3942776620388031,
 0.4029252231121063,
 0.4018080234527588,
 0.4062395691871643,
 0.40027502179145813,
 0.4083457589149475,
 0.40662145614624023,
 0.41092896461486816,
 0.4145979583263397,
 0.40669071674346924,
 0.40952977538108826,
 0.40616244077682495,
 0.4131413698196411,
 0.4092729389667511,
 0.4034976065158844,
 0.41512757539749146,
 0.4119105339050293,
 0.40655964612960815,
 0.408929705619812,
 0.41145050525665283,
 0.41087982058525085,
 0.41411158442497253,
 0.4156970977783203,
 0.407317578792572,
 0.4160541296005249,
 0.40773844718933105,
 0.4103267192840576,
 0.4156971871852875,
 0.4056411683559418,
 0.41510361433029175,
 0.40651416778564453,
 0.41778460144996643,
 0.4104567766189575,
 0.41644802689552307,
 0.4245200455188751,
 0.41403549909591675,
 0.41373562812805176,
 0.4250999987125397,
 0.42323631048202515,
 0.42887187004089355,
 0.4240097105503082,
 0.42733845114707947,
 0.4254225194454193,
 0.4203762710094452,
 0.4321698844432831,
 0.4299393892288208,
 0.42370253801345825,
 0.4285167455673218,
 0.4332222044467926,
 0.43463361263275146,
 0.4263263940811157,
 0.4288237690925598,
 0.42554783821105957,
 0.44043630361557007,
 0.44226983189582825,
 0.42672276496887207,
 0.4223831295967102,
 0.43828094005584717,
 0.4376813769340515,
 0.43595030903816223,
 0.4439915716648102,
 0.4518167972564697,
 0.4379619061946869,
 0.44754624366760254,
 0.4432525038719177,
 0.4454522132873535,
 0.45309847593307495,
 0.44010117650032043,
 0.43467891216278076,
 0.44175300002098083]
In [184]:
fig_2 = go.Figure()
In [185]:
fig_2.add_trace(go.Scatter(x =np.arange(0,len(model_history.history['loss'])),
                         y = model_history.history['loss'],
                         mode='lines+markers',
                         name='loss'))
fig_2.add_trace(go.Scatter(x =np.arange(0,len(model_history.history['loss'])),
                         y = model_history.history['val_loss'],
                         mode='lines+markers',
                         name='val_loss'))
fig_2.update_layout(title = 'LOSS vs VALIDATION_LOSS')

fig_2.update_xaxes(title_text="Epochs")
fig_2.update_yaxes(title_text="Loss")

fig_2.show()

From the above line chart, we see that the validation loss is increasing and loss is decreasing while increase in epochs..¶

In [186]:
ANN_model.evaluate(x_test, y_test)
45/45 [==============================] - 0s 3ms/step - loss: 0.4346 - accuracy: 0.8055
Out[186]:
[0.4346335530281067, 0.8055358529090881]
In [187]:
from sklearn.metrics import confusion_matrix, classification_report
In [188]:
predict = ANN_model.predict(x_test)
45/45 [==============================] - 0s 1ms/step
In [189]:
predict
Out[189]:
array([[9.7893542e-05],
       [7.0410240e-01],
       [6.3315587e-04],
       ...,
       [8.2363045e-01],
       [5.0050294e-01],
       [5.0984734e-01]], dtype=float32)
In [190]:
predict_new = []
for x in predict:
    if x >= 0.5:
        predict_new.append(1)
    else:
        predict_new.append(0)       
In [191]:
predict_new[-10 : ]
Out[191]:
[0, 0, 0, 0, 0, 0, 1, 1, 1, 1]
In [192]:
plx.imshow(confusion_matrix( y_test, predict_new), text_auto = True)
In [193]:
print(classification_report(y_test, predict_new))
              precision    recall  f1-score   support

         0.0       0.85      0.90      0.87      1062
         1.0       0.63      0.52      0.57       347

    accuracy                           0.81      1409
   macro avg       0.74      0.71      0.72      1409
weighted avg       0.80      0.81      0.80      1409

Final test Accuracy for our 1st model is 81%¶

Model 2¶

In [194]:
plx.imshow(df.corr(), height = 1700, width = 1700, text_auto = True)

Here, we're having too many feature to predict. Some of the feature's correlation are nearly zero. It may leads to overfitting. So, we have to remove some less important features in the dataframe.¶

Dimensionality Reduction¶

In [195]:
corr = df.corr()
In [196]:
df.corr()
Out[196]:
SeniorCitizen Partner Dependents tenure PhoneService MultipleLines OnlineSecurity OnlineBackup DeviceProtection TechSupport StreamingTV StreamingMovies PaperlessBilling MonthlyCharges TotalCharges Churn gender_Male InternetService_DSL InternetService_Fiber optic InternetService_No Contract_Month-to-month Contract_One year Contract_Two year PaymentMethod_Bank transfer (automatic) PaymentMethod_Credit card (automatic) PaymentMethod_Electronic check PaymentMethod_Mailed check
SeniorCitizen 1.000000 0.016479 -0.211185 0.016567 0.008576 0.142948 -0.038653 0.066572 0.059428 -0.060625 0.105378 0.120176 0.156530 0.220173 0.102994 0.150889 -0.001874 -0.108322 0.255338 -0.182742 0.138360 -0.046262 -0.117000 -0.016159 -0.024135 0.171718 -0.153477
Partner 0.016479 1.000000 0.452676 0.379697 0.017706 0.142057 0.143106 0.141498 0.153786 0.119999 0.124666 0.117412 -0.014877 0.096848 0.317540 -0.150448 -0.001808 -0.000851 0.000304 0.000615 -0.280865 0.082783 0.248091 0.110706 0.082029 -0.083852 -0.095125
Dependents -0.211185 0.452676 1.000000 0.159712 -0.001762 -0.024526 0.080972 0.023671 0.013963 0.063268 -0.016558 -0.039741 -0.111377 -0.113890 0.062136 -0.164221 0.010517 0.052010 -0.165818 0.139812 -0.231720 0.068368 0.204613 0.052021 0.060267 -0.150642 0.059071
tenure 0.016567 0.379697 0.159712 1.000000 0.008448 0.331941 0.327203 0.360277 0.360653 0.324221 0.279756 0.286111 0.006152 0.247900 0.826160 -0.352229 0.005106 0.013274 0.019720 -0.039062 -0.645561 0.202570 0.558533 0.243510 0.233006 -0.208363 -0.233852
PhoneService 0.008576 0.017706 -0.001762 0.008448 1.000000 0.279690 -0.092893 -0.052312 -0.071227 -0.096340 -0.022574 -0.032959 0.016505 0.247398 0.113207 0.011942 -0.006488 -0.452425 0.289999 0.172209 -0.000742 -0.002791 0.003519 0.007556 -0.007721 0.003062 -0.003319
MultipleLines 0.142948 0.142057 -0.024526 0.331941 0.279690 1.000000 0.098108 0.202237 0.201137 0.100571 0.257152 0.258751 0.163530 0.490434 0.468516 0.040102 -0.008414 -0.199920 0.366083 -0.210564 -0.088203 -0.003794 0.106253 0.075527 0.060048 0.083618 -0.227206
OnlineSecurity -0.038653 0.143106 0.080972 0.327203 -0.092893 0.098108 1.000000 0.283832 0.275438 0.354931 0.176207 0.187398 -0.003636 0.296594 0.411672 -0.171226 -0.017021 0.321269 -0.030696 -0.333403 -0.246679 0.100162 0.191773 0.095158 0.115721 -0.112338 -0.080798
OnlineBackup 0.066572 0.141498 0.023671 0.360277 -0.052312 0.202237 0.283832 1.000000 0.303546 0.294233 0.282106 0.274501 0.126735 0.441780 0.509246 -0.082255 -0.013773 0.157884 0.165651 -0.381593 -0.164172 0.083722 0.111400 0.087004 0.090785 -0.000408 -0.174164
DeviceProtection 0.059428 0.153786 0.013963 0.360653 -0.071227 0.201137 0.275438 0.303546 1.000000 0.333313 0.390874 0.402111 0.103797 0.482692 0.522003 -0.066160 -0.002105 0.146291 0.176049 -0.380754 -0.225662 0.102495 0.165096 0.083115 0.111554 -0.003351 -0.187373
TechSupport -0.060625 0.119999 0.063268 0.324221 -0.096340 0.100571 0.354931 0.294233 0.333313 1.000000 0.278070 0.279358 0.037880 0.338304 0.431904 -0.164674 -0.009212 0.313118 -0.020492 -0.336298 -0.285241 0.095775 0.240824 0.101252 0.117272 -0.114839 -0.085509
StreamingTV 0.105378 0.124666 -0.016558 0.279756 -0.022574 0.257152 0.176207 0.282106 0.390874 0.278070 1.000000 0.533094 0.223841 0.629603 0.514990 0.063228 -0.008393 0.016274 0.329349 -0.415552 -0.112282 0.061612 0.072049 0.046252 0.040433 0.144626 -0.247742
StreamingMovies 0.120176 0.117412 -0.039741 0.286111 -0.032959 0.258751 0.187398 0.274501 0.402111 0.279358 0.533094 1.000000 0.211716 0.627429 0.520118 0.061382 -0.010487 0.025698 0.322923 -0.418675 -0.116633 0.064926 0.073960 0.048652 0.048575 0.137966 -0.250595
PaperlessBilling 0.156530 -0.014877 -0.111377 0.006152 0.016505 0.163530 -0.003636 0.126735 0.103797 0.037880 0.223841 0.211716 1.000000 0.352150 0.158557 0.191825 -0.011754 -0.063121 0.326853 -0.321013 0.169096 -0.051391 -0.147889 -0.016332 -0.013589 0.208865 -0.205398
MonthlyCharges 0.220173 0.096848 -0.113890 0.247900 0.247398 0.490434 0.296594 0.441780 0.482692 0.338304 0.629603 0.627429 0.352150 1.000000 0.651169 0.193356 -0.014569 -0.160189 0.787066 -0.763557 0.060165 0.004904 -0.074681 0.042812 0.030550 0.271625 -0.377437
TotalCharges 0.102994 0.317540 0.062136 0.826160 0.113207 0.468516 0.411672 0.509246 0.522003 0.431904 0.514990 0.520118 0.158557 0.651169 1.000000 -0.198353 -0.000077 -0.052462 0.361636 -0.375207 -0.444311 0.170810 0.354550 0.185990 0.182910 -0.059274 -0.295726
Churn 0.150889 -0.150448 -0.164221 -0.352229 0.011942 0.040102 -0.171226 -0.082255 -0.066160 -0.164674 0.063228 0.061382 0.191825 0.193356 -0.198353 1.000000 -0.008612 -0.124214 0.308020 -0.227890 0.405103 -0.177820 -0.302253 -0.117937 -0.134302 0.301919 -0.091683
gender_Male -0.001874 -0.001808 0.010517 0.005106 -0.006488 -0.008414 -0.017021 -0.013773 -0.002105 -0.009212 -0.008393 -0.010487 -0.011754 -0.014569 -0.000077 -0.008612 1.000000 0.006568 -0.011286 0.006026 -0.003386 0.008026 -0.003695 -0.016024 0.001215 0.000752 0.013744
InternetService_DSL -0.108322 -0.000851 0.052010 0.013274 -0.452425 -0.199920 0.321269 0.157884 0.146291 0.313118 0.016274 0.025698 -0.063121 -0.160189 -0.052462 -0.124214 0.006568 1.000000 -0.640987 -0.380635 -0.065509 0.046795 0.031714 0.025476 0.051438 -0.104418 0.041899
InternetService_Fiber optic 0.255338 0.000304 -0.165818 0.019720 0.289999 0.366083 -0.030696 0.165651 0.176049 -0.020492 0.329349 0.322923 0.326853 0.787066 0.361636 0.308020 -0.011286 -0.640987 1.000000 -0.465793 0.244164 -0.076324 -0.211526 -0.022624 -0.050077 0.336410 -0.306834
InternetService_No -0.182742 0.000615 0.139812 -0.039062 0.172209 -0.210564 -0.333403 -0.381593 -0.380754 -0.336298 -0.415552 -0.418675 -0.321013 -0.763557 -0.375207 -0.227890 0.006026 -0.380635 -0.465793 1.000000 -0.218639 0.038004 0.218278 -0.002113 0.001030 -0.284917 0.321361
Contract_Month-to-month 0.138360 -0.280865 -0.231720 -0.645561 -0.000742 -0.088203 -0.246679 -0.164172 -0.225662 -0.285241 -0.112282 -0.116633 0.169096 0.060165 -0.444311 0.405103 -0.003386 -0.065509 0.244164 -0.218639 1.000000 -0.568744 -0.622633 -0.179707 -0.204145 0.331661 0.004138
Contract_One year -0.046262 0.082783 0.068368 0.202570 -0.002791 -0.003794 0.100162 0.083722 0.102495 0.095775 0.061612 0.064926 -0.051391 0.004904 0.170810 -0.177820 0.008026 0.046795 -0.076324 0.038004 -0.568744 1.000000 -0.289510 0.057451 0.067589 -0.109130 -0.000116
Contract_Two year -0.117000 0.248091 0.204613 0.558533 0.003519 0.106253 0.191773 0.111400 0.165096 0.240824 0.072049 0.073960 -0.147889 -0.074681 0.354550 -0.302253 -0.003695 0.031714 -0.211526 0.218278 -0.622633 -0.289510 1.000000 0.154471 0.173265 -0.282138 -0.004705
PaymentMethod_Bank transfer (automatic) -0.016159 0.110706 0.052021 0.243510 0.007556 0.075527 0.095158 0.087004 0.083115 0.101252 0.046252 0.048652 -0.016332 0.042812 0.185990 -0.117937 -0.016024 0.025476 -0.022624 -0.002113 -0.179707 0.057451 0.154471 1.000000 -0.278215 -0.376762 -0.288685
PaymentMethod_Credit card (automatic) -0.024135 0.082029 0.060267 0.233006 -0.007721 0.060048 0.115721 0.090785 0.111554 0.117272 0.040433 0.048575 -0.013589 0.030550 0.182910 -0.134302 0.001215 0.051438 -0.050077 0.001030 -0.204145 0.067589 0.173265 -0.278215 1.000000 -0.373322 -0.286049
PaymentMethod_Electronic check 0.171718 -0.083852 -0.150642 -0.208363 0.003062 0.083618 -0.112338 -0.000408 -0.003351 -0.114839 0.144626 0.137966 0.208865 0.271625 -0.059274 0.301919 0.000752 -0.104418 0.336410 -0.284917 0.331661 -0.109130 -0.282138 -0.376762 -0.373322 1.000000 -0.387372
PaymentMethod_Mailed check -0.153477 -0.095125 0.059071 -0.233852 -0.003319 -0.227206 -0.080798 -0.174164 -0.187373 -0.085509 -0.247742 -0.250595 -0.205398 -0.377437 -0.295726 -0.091683 0.013744 0.041899 -0.306834 0.321361 0.004138 -0.000116 -0.004705 -0.288685 -0.286049 -0.387372 1.000000
In [197]:
index = corr['Churn'].index
values = corr['Churn'].values
values = [abs(x) for x in values]
In [198]:
sort_df = pd.DataFrame()
sort_df['index'] , sort_df['values'] = list(index), list(values)
In [199]:
sort_df
Out[199]:
index values
0 SeniorCitizen 0.150889
1 Partner 0.150448
2 Dependents 0.164221
3 tenure 0.352229
4 PhoneService 0.011942
5 MultipleLines 0.040102
6 OnlineSecurity 0.171226
7 OnlineBackup 0.082255
8 DeviceProtection 0.066160
9 TechSupport 0.164674
10 StreamingTV 0.063228
11 StreamingMovies 0.061382
12 PaperlessBilling 0.191825
13 MonthlyCharges 0.193356
14 TotalCharges 0.198353
15 Churn 1.000000
16 gender_Male 0.008612
17 InternetService_DSL 0.124214
18 InternetService_Fiber optic 0.308020
19 InternetService_No 0.227890
20 Contract_Month-to-month 0.405103
21 Contract_One year 0.177820
22 Contract_Two year 0.302253
23 PaymentMethod_Bank transfer (automatic) 0.117937
24 PaymentMethod_Credit card (automatic) 0.134302
25 PaymentMethod_Electronic check 0.301919
26 PaymentMethod_Mailed check 0.091683
In [200]:
sort_df = sort_df.sort_values(by=['values'])
In [201]:
# sort_df is sorted by the column 'values'
sort_df.reset_index(inplace = True)
In [202]:
sort_df
Out[202]:
level_0 index values
0 16 gender_Male 0.008612
1 4 PhoneService 0.011942
2 5 MultipleLines 0.040102
3 11 StreamingMovies 0.061382
4 10 StreamingTV 0.063228
5 8 DeviceProtection 0.066160
6 7 OnlineBackup 0.082255
7 26 PaymentMethod_Mailed check 0.091683
8 23 PaymentMethod_Bank transfer (automatic) 0.117937
9 17 InternetService_DSL 0.124214
10 24 PaymentMethod_Credit card (automatic) 0.134302
11 1 Partner 0.150448
12 0 SeniorCitizen 0.150889
13 2 Dependents 0.164221
14 9 TechSupport 0.164674
15 6 OnlineSecurity 0.171226
16 21 Contract_One year 0.177820
17 12 PaperlessBilling 0.191825
18 13 MonthlyCharges 0.193356
19 14 TotalCharges 0.198353
20 19 InternetService_No 0.227890
21 25 PaymentMethod_Electronic check 0.301919
22 22 Contract_Two year 0.302253
23 18 InternetService_Fiber optic 0.308020
24 3 tenure 0.352229
25 20 Contract_Month-to-month 0.405103
26 15 Churn 1.000000

In the main dataframe(df), we're going to remove first 17 columns from sort_df.¶

In [203]:
to_remove_features = list(sort_df['index'][0 : 17])
In [204]:
to_remove_features
Out[204]:
['gender_Male',
 'PhoneService',
 'MultipleLines',
 'StreamingMovies',
 'StreamingTV',
 'DeviceProtection',
 'OnlineBackup',
 'PaymentMethod_Mailed check',
 'PaymentMethod_Bank transfer (automatic)',
 'InternetService_DSL',
 'PaymentMethod_Credit card (automatic)',
 'Partner',
 'SeniorCitizen',
 'Dependents',
 'TechSupport',
 'OnlineSecurity',
 'Contract_One year']

These are the 17 least important features of the dataset

In [205]:
df_old = df
In [206]:
df = df.drop(to_remove_features, axis = 1)
In [207]:
df
Out[207]:
tenure PaperlessBilling MonthlyCharges TotalCharges Churn InternetService_Fiber optic InternetService_No Contract_Month-to-month Contract_Two year PaymentMethod_Electronic check
0 0.013889 1.0 0.115423 0.001275 0.0 0.0 0.0 1.0 0.0 1.0
1 0.472222 0.0 0.385075 0.215867 0.0 0.0 0.0 0.0 0.0 0.0
2 0.027778 1.0 0.354229 0.010310 1.0 0.0 0.0 1.0 0.0 0.0
3 0.625000 0.0 0.239303 0.210241 0.0 0.0 0.0 0.0 0.0 0.0
4 0.027778 1.0 0.521891 0.015330 1.0 1.0 0.0 1.0 0.0 1.0
... ... ... ... ... ... ... ... ... ... ...
7038 0.333333 1.0 0.662189 0.227521 0.0 0.0 0.0 0.0 0.0 0.0
7039 1.000000 1.0 0.845274 0.847461 0.0 1.0 0.0 0.0 0.0 0.0
7040 0.152778 1.0 0.112935 0.037809 0.0 0.0 0.0 1.0 0.0 1.0
7041 0.055556 1.0 0.558706 0.033210 1.0 1.0 0.0 1.0 0.0 0.0
7042 0.916667 1.0 0.869652 0.787641 0.0 1.0 0.0 0.0 1.0 0.0

7043 rows × 10 columns

In [208]:
x_train = x_train.drop(to_remove_features, axis = 1)
x_test = x_test.drop(to_remove_features, axis = 1)

Now, the 17 least important columns were removed.¶

In [218]:
ANN_model_2 = Sequential()
In [219]:
# Adding Input Layer to ANN
ANN_model_2.add(Dense(units = 9, activation = 'relu'))
In [220]:
# Adding 1st Hidden Layer to the ANN
ANN_model_2.add(Dense(units = 7, activation = 'relu'))
#ANN_model_2.add(Dropout(0.3))
In [221]:
# Adding 2nd Hidden Layer to the ANN
ANN_model_2.add(Dense(units = 3, activation = 'relu'))
#ANN_model_2.add(Dropout(0.3))
In [222]:
# Adding Output Layer to the ANN
ANN_model_2.add(Dense(units = 1, activation = 'sigmoid'))
In [223]:
# Compiling the ANN model with required parameters
ANN_model_2.compile(optimizer = 'adam',
                  loss = 'binary_crossentropy',
                  metrics = ['accuracy'])
In [224]:
# Early Stopping is provided to avoid running too many epochs of no improvement.
early_stopping = tf.keras.callbacks.EarlyStopping(
    monitor="accuracy",
    min_delta=0.0001,
    patience=20,
    verbose=1,
    mode="auto",
    baseline=None,
    restore_best_weights=True
)
In [225]:
model_history_2 = ANN_model_2.fit(x_train, y_train, batch_size = 2, epochs = 150, validation_data = (x_test,y_test), callbacks = early_stopping )
Epoch 1/150
2817/2817 [==============================] - 12s 4ms/step - loss: 0.4722 - accuracy: 0.7575 - val_loss: 0.4139 - val_accuracy: 0.8041
Epoch 2/150
2817/2817 [==============================] - 12s 4ms/step - loss: 0.4427 - accuracy: 0.7904 - val_loss: 0.4095 - val_accuracy: 0.8062
Epoch 3/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4382 - accuracy: 0.7932 - val_loss: 0.4044 - val_accuracy: 0.8034
Epoch 4/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4357 - accuracy: 0.7946 - val_loss: 0.4049 - val_accuracy: 0.8020
Epoch 5/150
2817/2817 [==============================] - 12s 4ms/step - loss: 0.4345 - accuracy: 0.7946 - val_loss: 0.4018 - val_accuracy: 0.8112
Epoch 6/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4334 - accuracy: 0.7948 - val_loss: 0.4033 - val_accuracy: 0.8034
Epoch 7/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4320 - accuracy: 0.7936 - val_loss: 0.3998 - val_accuracy: 0.8048
Epoch 8/150
2817/2817 [==============================] - 12s 4ms/step - loss: 0.4309 - accuracy: 0.7964 - val_loss: 0.3993 - val_accuracy: 0.8077
Epoch 9/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4306 - accuracy: 0.7948 - val_loss: 0.3987 - val_accuracy: 0.8105
Epoch 10/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4308 - accuracy: 0.7934 - val_loss: 0.4002 - val_accuracy: 0.8062
Epoch 11/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4295 - accuracy: 0.7932 - val_loss: 0.4020 - val_accuracy: 0.8013
Epoch 12/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4295 - accuracy: 0.7982 - val_loss: 0.3966 - val_accuracy: 0.8091
Epoch 13/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4289 - accuracy: 0.7955 - val_loss: 0.3965 - val_accuracy: 0.8055
Epoch 14/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4290 - accuracy: 0.7991 - val_loss: 0.3985 - val_accuracy: 0.8048
Epoch 15/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4290 - accuracy: 0.7964 - val_loss: 0.3982 - val_accuracy: 0.8077
Epoch 16/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4284 - accuracy: 0.7922 - val_loss: 0.3966 - val_accuracy: 0.8062
Epoch 17/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4285 - accuracy: 0.7975 - val_loss: 0.3968 - val_accuracy: 0.8070
Epoch 18/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4272 - accuracy: 0.7966 - val_loss: 0.3964 - val_accuracy: 0.8084
Epoch 19/150
2817/2817 [==============================] - 12s 4ms/step - loss: 0.4274 - accuracy: 0.7959 - val_loss: 0.3965 - val_accuracy: 0.8112
Epoch 20/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4265 - accuracy: 0.7977 - val_loss: 0.3976 - val_accuracy: 0.8062
Epoch 21/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4270 - accuracy: 0.7943 - val_loss: 0.3940 - val_accuracy: 0.8098
Epoch 22/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4279 - accuracy: 0.7977 - val_loss: 0.3947 - val_accuracy: 0.8055
Epoch 23/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4269 - accuracy: 0.7943 - val_loss: 0.3952 - val_accuracy: 0.8133
Epoch 24/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4262 - accuracy: 0.7964 - val_loss: 0.3928 - val_accuracy: 0.8091
Epoch 25/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4258 - accuracy: 0.7980 - val_loss: 0.3928 - val_accuracy: 0.8148
Epoch 26/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4257 - accuracy: 0.7994 - val_loss: 0.4012 - val_accuracy: 0.8048
Epoch 27/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4254 - accuracy: 0.7975 - val_loss: 0.3969 - val_accuracy: 0.8105
Epoch 28/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4245 - accuracy: 0.7975 - val_loss: 0.3953 - val_accuracy: 0.8062
Epoch 29/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4260 - accuracy: 0.7985 - val_loss: 0.3965 - val_accuracy: 0.8098
Epoch 30/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4248 - accuracy: 0.7977 - val_loss: 0.3962 - val_accuracy: 0.8112
Epoch 31/150
2817/2817 [==============================] - 10s 4ms/step - loss: 0.4256 - accuracy: 0.7957 - val_loss: 0.3965 - val_accuracy: 0.8112
Epoch 32/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4252 - accuracy: 0.7969 - val_loss: 0.3926 - val_accuracy: 0.8148
Epoch 33/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4254 - accuracy: 0.7964 - val_loss: 0.3959 - val_accuracy: 0.8148
Epoch 34/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4238 - accuracy: 0.7985 - val_loss: 0.3933 - val_accuracy: 0.8133
Epoch 35/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4248 - accuracy: 0.7964 - val_loss: 0.3929 - val_accuracy: 0.8183
Epoch 36/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4246 - accuracy: 0.7962 - val_loss: 0.3974 - val_accuracy: 0.8148
Epoch 37/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4235 - accuracy: 0.7966 - val_loss: 0.3948 - val_accuracy: 0.8105
Epoch 38/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4250 - accuracy: 0.7987 - val_loss: 0.3984 - val_accuracy: 0.8105
Epoch 39/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4244 - accuracy: 0.7978 - val_loss: 0.3959 - val_accuracy: 0.8190
Epoch 40/150
2817/2817 [==============================] - 12s 4ms/step - loss: 0.4236 - accuracy: 0.7964 - val_loss: 0.3929 - val_accuracy: 0.8141
Epoch 41/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4242 - accuracy: 0.7969 - val_loss: 0.3964 - val_accuracy: 0.8105
Epoch 42/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4237 - accuracy: 0.8016 - val_loss: 0.3938 - val_accuracy: 0.8098
Epoch 43/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4239 - accuracy: 0.7994 - val_loss: 0.3959 - val_accuracy: 0.8077
Epoch 44/150
2817/2817 [==============================] - 12s 4ms/step - loss: 0.4234 - accuracy: 0.7991 - val_loss: 0.3958 - val_accuracy: 0.8126
Epoch 45/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4237 - accuracy: 0.7987 - val_loss: 0.3953 - val_accuracy: 0.8141
Epoch 46/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4230 - accuracy: 0.7998 - val_loss: 0.3951 - val_accuracy: 0.8126
Epoch 47/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4230 - accuracy: 0.7977 - val_loss: 0.3942 - val_accuracy: 0.8098
Epoch 48/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4227 - accuracy: 0.7996 - val_loss: 0.3940 - val_accuracy: 0.8098
Epoch 49/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4227 - accuracy: 0.8001 - val_loss: 0.3942 - val_accuracy: 0.8112
Epoch 50/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4226 - accuracy: 0.7978 - val_loss: 0.3931 - val_accuracy: 0.8098
Epoch 51/150
2817/2817 [==============================] - 12s 4ms/step - loss: 0.4238 - accuracy: 0.7989 - val_loss: 0.3928 - val_accuracy: 0.8119
Epoch 52/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4225 - accuracy: 0.7978 - val_loss: 0.3981 - val_accuracy: 0.8098
Epoch 53/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4238 - accuracy: 0.7994 - val_loss: 0.3957 - val_accuracy: 0.8141
Epoch 54/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4228 - accuracy: 0.7993 - val_loss: 0.3958 - val_accuracy: 0.8119
Epoch 55/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4216 - accuracy: 0.7968 - val_loss: 0.3948 - val_accuracy: 0.8126
Epoch 56/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4224 - accuracy: 0.8016 - val_loss: 0.3956 - val_accuracy: 0.8077
Epoch 57/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4227 - accuracy: 0.8017 - val_loss: 0.4001 - val_accuracy: 0.8084
Epoch 58/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4222 - accuracy: 0.7977 - val_loss: 0.3957 - val_accuracy: 0.8133
Epoch 59/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4224 - accuracy: 0.8005 - val_loss: 0.3968 - val_accuracy: 0.8070
Epoch 60/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4213 - accuracy: 0.7966 - val_loss: 0.3966 - val_accuracy: 0.8105
Epoch 61/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4216 - accuracy: 0.7991 - val_loss: 0.3990 - val_accuracy: 0.8034
Epoch 62/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4226 - accuracy: 0.8019 - val_loss: 0.3947 - val_accuracy: 0.8119
Epoch 63/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4225 - accuracy: 0.7980 - val_loss: 0.3970 - val_accuracy: 0.8105
Epoch 64/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4228 - accuracy: 0.7996 - val_loss: 0.3923 - val_accuracy: 0.8084
Epoch 65/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4222 - accuracy: 0.7991 - val_loss: 0.3966 - val_accuracy: 0.8112
Epoch 66/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4216 - accuracy: 0.8000 - val_loss: 0.3965 - val_accuracy: 0.8112
Epoch 67/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4218 - accuracy: 0.7978 - val_loss: 0.3963 - val_accuracy: 0.8133
Epoch 68/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4218 - accuracy: 0.8007 - val_loss: 0.3956 - val_accuracy: 0.8112
Epoch 69/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4214 - accuracy: 0.7980 - val_loss: 0.3948 - val_accuracy: 0.8133
Epoch 70/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4224 - accuracy: 0.8005 - val_loss: 0.3942 - val_accuracy: 0.8119
Epoch 71/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4219 - accuracy: 0.8026 - val_loss: 0.3967 - val_accuracy: 0.8077
Epoch 72/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4218 - accuracy: 0.7994 - val_loss: 0.3945 - val_accuracy: 0.8133
Epoch 73/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4212 - accuracy: 0.8007 - val_loss: 0.4045 - val_accuracy: 0.8048
Epoch 74/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4217 - accuracy: 0.8009 - val_loss: 0.3944 - val_accuracy: 0.8119
Epoch 75/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4219 - accuracy: 0.7984 - val_loss: 0.3930 - val_accuracy: 0.8119
Epoch 76/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4220 - accuracy: 0.8007 - val_loss: 0.3934 - val_accuracy: 0.8084
Epoch 77/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4210 - accuracy: 0.7980 - val_loss: 0.3951 - val_accuracy: 0.8126
Epoch 78/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4219 - accuracy: 0.7991 - val_loss: 0.3932 - val_accuracy: 0.8077
Epoch 79/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4220 - accuracy: 0.7989 - val_loss: 0.3956 - val_accuracy: 0.8133
Epoch 80/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4216 - accuracy: 0.8009 - val_loss: 0.3968 - val_accuracy: 0.8098
Epoch 81/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4215 - accuracy: 0.7993 - val_loss: 0.3945 - val_accuracy: 0.8148
Epoch 82/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4209 - accuracy: 0.8021 - val_loss: 0.3986 - val_accuracy: 0.8091
Epoch 83/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4217 - accuracy: 0.7980 - val_loss: 0.3937 - val_accuracy: 0.8141
Epoch 84/150
2817/2817 [==============================] - 12s 4ms/step - loss: 0.4208 - accuracy: 0.7984 - val_loss: 0.3963 - val_accuracy: 0.8148
Epoch 85/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4206 - accuracy: 0.7987 - val_loss: 0.3934 - val_accuracy: 0.8141
Epoch 86/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4206 - accuracy: 0.7998 - val_loss: 0.3937 - val_accuracy: 0.8162
Epoch 87/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4211 - accuracy: 0.7991 - val_loss: 0.3950 - val_accuracy: 0.8112
Epoch 88/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4204 - accuracy: 0.7991 - val_loss: 0.3925 - val_accuracy: 0.8112
Epoch 89/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4205 - accuracy: 0.8003 - val_loss: 0.3960 - val_accuracy: 0.8077
Epoch 90/150
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4200 - accuracy: 0.7989 - val_loss: 0.3917 - val_accuracy: 0.8155
Epoch 91/150
2809/2817 [============================>.] - ETA: 0s - loss: 0.4199 - accuracy: 0.8015Restoring model weights from the end of the best epoch: 71.
2817/2817 [==============================] - 11s 4ms/step - loss: 0.4204 - accuracy: 0.8012 - val_loss: 0.3932 - val_accuracy: 0.8148
Epoch 91: early stopping
In [226]:
fig_3 = go.Figure()
In [227]:
fig_3.add_trace(go.Scatter(x =np.arange(0,len(model_history_2.history['accuracy'])),
                         y = model_history_2.history['val_accuracy'],
                         mode='lines+markers',
                         name='val_accuracy'))
fig_3.add_trace(go.Scatter(x =np.arange(0,len(model_history_2.history['accuracy'])),
                         y = model_history_2.history['accuracy'],
                         mode='lines+markers',
                         name='Accuracy'))
fig_3.update_layout(title = 'ACCURACY vs VALIDATION_ACCURACY')

fig_3.update_xaxes(title_text="Epochs")
fig_3.update_yaxes(title_text="Accuracy")

fig_3.show()
In [228]:
fig_4 = go.Figure()
In [229]:
fig_4.add_trace(go.Scatter(x =np.arange(0,len(model_history_2.history['loss'])),
                         y = model_history_2.history['loss'],
                         mode='lines+markers',
                         name='loss'))
fig_4.add_trace(go.Scatter(x =np.arange(0,len(model_history_2.history['loss'])),
                         y = model_history_2.history['val_loss'],
                         mode='lines+markers',
                         name='val_loss'))
fig_4.update_layout(title = 'LOSS vs VALIDATION_LOSS')

fig_4.update_xaxes(title_text="Epochs")
fig_4.update_yaxes(title_text="Loss")

fig_4.show()
In [230]:
ANN_model_2.evaluate(x_test, y_test)
45/45 [==============================] - 0s 4ms/step - loss: 0.3967 - accuracy: 0.8077
Out[230]:
[0.3967050015926361, 0.8076649904251099]

The Accuracy after removing least importance features is 80.77%.¶

Same accuracy for both the models.. We have to keep in mind that we're performing in imbalanced data.. It may increase/decrease after under/over sampling the dataset..¶

In [237]:
predict_2 = ANN_model_2.predict(x_test)
45/45 [==============================] - 0s 2ms/step
In [238]:
predict_2
Out[238]:
array([[0.00228695],
       [0.09408851],
       [0.00781803],
       ...,
       [0.36911735],
       [0.67390597],
       [0.7055226 ]], dtype=float32)
In [239]:
# Converting those predicted probabilities into Binary output
predict_new_2 = []
for x in predict_2:
    if x >= 0.5:
        predict_new_2.append(1)
    else:
        predict_new_2.append(0)
In [240]:
predict_new_2[-10 : ]
Out[240]:
[0, 0, 0, 0, 0, 0, 0, 0, 1, 1]
In [241]:
plx.imshow(confusion_matrix( y_test, predict_new_2), text_auto = True)
In [242]:
print(classification_report(y_test, predict_new_2))
              precision    recall  f1-score   support

         0.0       0.84      0.91      0.88      1062
         1.0       0.65      0.48      0.55       347

    accuracy                           0.81      1409
   macro avg       0.75      0.70      0.72      1409
weighted avg       0.80      0.81      0.80      1409

The Accuracy of 2nd model is 81%¶

There is no difference after dimensionality reduction¶

Still we din't performed any technique to balance the dataset... Let's check and handle the problem..¶

In [243]:
df
Out[243]:
tenure PaperlessBilling MonthlyCharges TotalCharges Churn InternetService_Fiber optic InternetService_No Contract_Month-to-month Contract_Two year PaymentMethod_Electronic check
0 0.013889 1.0 0.115423 0.001275 0.0 0.0 0.0 1.0 0.0 1.0
1 0.472222 0.0 0.385075 0.215867 0.0 0.0 0.0 0.0 0.0 0.0
2 0.027778 1.0 0.354229 0.010310 1.0 0.0 0.0 1.0 0.0 0.0
3 0.625000 0.0 0.239303 0.210241 0.0 0.0 0.0 0.0 0.0 0.0
4 0.027778 1.0 0.521891 0.015330 1.0 1.0 0.0 1.0 0.0 1.0
... ... ... ... ... ... ... ... ... ... ...
7038 0.333333 1.0 0.662189 0.227521 0.0 0.0 0.0 0.0 0.0 0.0
7039 1.000000 1.0 0.845274 0.847461 0.0 1.0 0.0 0.0 0.0 0.0
7040 0.152778 1.0 0.112935 0.037809 0.0 0.0 0.0 1.0 0.0 1.0
7041 0.055556 1.0 0.558706 0.033210 1.0 1.0 0.0 1.0 0.0 0.0
7042 0.916667 1.0 0.869652 0.787641 0.0 1.0 0.0 0.0 1.0 0.0

7043 rows × 10 columns

In [244]:
fig = plx.histogram(df, x = 'Churn', title = "Churn", color = 'Churn')
fig.update_traces(dict(marker_line_width=0))
fig.show()

From above histogram, we confirm that the dataset is imbalanced¶

In [245]:
churn_0 = df.where(df['Churn'] == 0).dropna()
churn_1 = df.where(df['Churn'] == 1).dropna()
In [246]:
len(churn_0), len(churn_1)
Out[246]:
(5174, 1869)
In [247]:
len(df)
Out[247]:
7043

73.43% of the dataset belongs to no-churn and rest of it i.e., 26.54% belongs to churn¶

This is purely an imbalanced dataset... we have to make it balanced..¶

There are two techniques 1. Under-sampling and 2. Over-sampling..¶

Here we're going to mainly focus on over-sampling. The reason is to avoid data loss in undersampling technique...¶

Let's go with over sampling..¶

In [248]:
x_new, y_new = df_old.drop(['Churn'], axis = 1), df_old['Churn']
In [249]:
x_new
Out[249]:
SeniorCitizen Partner Dependents tenure PhoneService MultipleLines OnlineSecurity OnlineBackup DeviceProtection TechSupport StreamingTV StreamingMovies PaperlessBilling MonthlyCharges TotalCharges gender_Male InternetService_DSL InternetService_Fiber optic InternetService_No Contract_Month-to-month Contract_One year Contract_Two year PaymentMethod_Bank transfer (automatic) PaymentMethod_Credit card (automatic) PaymentMethod_Electronic check PaymentMethod_Mailed check
0 0.0 1.0 0.0 0.013889 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 0.115423 0.001275 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0
1 0.0 0.0 0.0 0.472222 1.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.385075 0.215867 1.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0
2 0.0 0.0 0.0 0.027778 1.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 1.0 0.354229 0.010310 1.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0
3 0.0 0.0 0.0 0.625000 0.0 0.0 1.0 0.0 1.0 1.0 0.0 0.0 0.0 0.239303 0.210241 1.0 1.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0
4 0.0 0.0 0.0 0.027778 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.521891 0.015330 0.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
7038 0.0 1.0 1.0 0.333333 1.0 1.0 1.0 0.0 1.0 1.0 1.0 1.0 1.0 0.662189 0.227521 1.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0
7039 0.0 1.0 1.0 1.000000 1.0 1.0 0.0 1.0 1.0 0.0 1.0 1.0 1.0 0.845274 0.847461 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0
7040 0.0 1.0 1.0 0.152778 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.112935 0.037809 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0
7041 1.0 1.0 0.0 0.055556 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.558706 0.033210 1.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0
7042 0.0 0.0 0.0 0.916667 1.0 0.0 1.0 0.0 1.0 1.0 1.0 1.0 1.0 0.869652 0.787641 1.0 0.0 1.0 0.0 0.0 0.0 1.0 1.0 0.0 0.0 0.0

7043 rows × 26 columns

In [250]:
y_new
Out[250]:
0       0.0
1       0.0
2       1.0
3       0.0
4       1.0
       ... 
7038    0.0
7039    0.0
7040    0.0
7041    1.0
7042    0.0
Name: Churn, Length: 7043, dtype: float64

Over Sampling¶

In [251]:
#!pip install imblearn
In [252]:
from imblearn.over_sampling import RandomOverSampler
In [253]:
ros = RandomOverSampler(sampling_strategy = 'auto')
In [254]:
x_resampled, y_resampled = ros.fit_resample(x_new, y_new)
In [255]:
len(x_resampled)
Out[255]:
10348
In [256]:
from collections import Counter
print(Counter(y_resampled))
Counter({0.0: 5174, 1.0: 5174})

Here, we over sampled the dataset.. The ratio here for both the classes of dependent feature are 1:1¶

Model_3¶

In [257]:
x_train_3, x_test_3, y_train_3, y_test_3 = train_test_split( x_resampled, y_resampled, test_size = 0.2, random_state = 35)
In [258]:
ANN_model_3 = Sequential()

# Adding Input Layer to ANN
ANN_model_3.add(Dense(units = 27, activation = 'relu'))

# Adding 1st Hidden Layer to the ANN
ANN_model_3.add(Dense(units = 15, activation = 'relu'))
ANN_model_3.add(Dropout(0.4))

# Adding 2nd Hidden Layer to the ANN
ANN_model_3.add(Dense(units = 7, activation = 'relu'))
ANN_model_3.add(Dropout(0.3))

# Adding Output Layer to the ANN
ANN_model_3.add(Dense(units = 1, activation = 'sigmoid'))

ANN_model_3.compile(optimizer = 'adam',
                  loss = 'binary_crossentropy',
                  metrics = ['accuracy'])

early_stopping = tf.keras.callbacks.EarlyStopping(
    monitor="accuracy",
    min_delta=0.0001,
    patience=20,
    verbose=1,
    mode="auto",
    baseline=None,
    restore_best_weights=True
)

model_history_3 = ANN_model_3.fit(x_train_3, y_train_3, batch_size = 2, epochs = 150, validation_data = (x_test_3, y_test_3), callbacks = early_stopping )
Epoch 1/150
4139/4139 [==============================] - 20s 5ms/step - loss: 0.5543 - accuracy: 0.7218 - val_loss: 0.4874 - val_accuracy: 0.7652
Epoch 2/150
4139/4139 [==============================] - 19s 4ms/step - loss: 0.5205 - accuracy: 0.7538 - val_loss: 0.4717 - val_accuracy: 0.7749
Epoch 3/150
4139/4139 [==============================] - 19s 5ms/step - loss: 0.5026 - accuracy: 0.7586 - val_loss: 0.4747 - val_accuracy: 0.7681
Epoch 4/150
4139/4139 [==============================] - 19s 5ms/step - loss: 0.5085 - accuracy: 0.7602 - val_loss: 0.4748 - val_accuracy: 0.7744
Epoch 5/150
4139/4139 [==============================] - 18s 4ms/step - loss: 0.4999 - accuracy: 0.7644 - val_loss: 0.4690 - val_accuracy: 0.7797
Epoch 6/150
4139/4139 [==============================] - 18s 4ms/step - loss: 0.4932 - accuracy: 0.7621 - val_loss: 0.4656 - val_accuracy: 0.7870
Epoch 7/150
4139/4139 [==============================] - 19s 5ms/step - loss: 0.4875 - accuracy: 0.7665 - val_loss: 0.4592 - val_accuracy: 0.7812
Epoch 8/150
4139/4139 [==============================] - 19s 4ms/step - loss: 0.4844 - accuracy: 0.7727 - val_loss: 0.4585 - val_accuracy: 0.7850
Epoch 9/150
4139/4139 [==============================] - 19s 5ms/step - loss: 0.4797 - accuracy: 0.7693 - val_loss: 0.4596 - val_accuracy: 0.7845
Epoch 10/150
4139/4139 [==============================] - 18s 4ms/step - loss: 0.4728 - accuracy: 0.7780 - val_loss: 0.4600 - val_accuracy: 0.7836
Epoch 11/150
4139/4139 [==============================] - 19s 4ms/step - loss: 0.4680 - accuracy: 0.7770 - val_loss: 0.4744 - val_accuracy: 0.7797
Epoch 12/150
4139/4139 [==============================] - 18s 4ms/step - loss: 0.4678 - accuracy: 0.7771 - val_loss: 0.4639 - val_accuracy: 0.7831
Epoch 13/150
4139/4139 [==============================] - 18s 4ms/step - loss: 0.4647 - accuracy: 0.7801 - val_loss: 0.4561 - val_accuracy: 0.7899
Epoch 14/150
4139/4139 [==============================] - 18s 4ms/step - loss: 0.4625 - accuracy: 0.7845 - val_loss: 0.4624 - val_accuracy: 0.7855
Epoch 15/150
4139/4139 [==============================] - 18s 4ms/step - loss: 0.4631 - accuracy: 0.7873 - val_loss: 0.4637 - val_accuracy: 0.7855
Epoch 16/150
4139/4139 [==============================] - 19s 5ms/step - loss: 0.4544 - accuracy: 0.7886 - val_loss: 0.4592 - val_accuracy: 0.7865
Epoch 17/150
4139/4139 [==============================] - 18s 4ms/step - loss: 0.4548 - accuracy: 0.7857 - val_loss: 0.4571 - val_accuracy: 0.7850
Epoch 18/150
4139/4139 [==============================] - 19s 5ms/step - loss: 0.4500 - accuracy: 0.7904 - val_loss: 0.4598 - val_accuracy: 0.7908
Epoch 19/150
4139/4139 [==============================] - 18s 4ms/step - loss: 0.4477 - accuracy: 0.7910 - val_loss: 0.4624 - val_accuracy: 0.7778
Epoch 20/150
4139/4139 [==============================] - 18s 4ms/step - loss: 0.4431 - accuracy: 0.7957 - val_loss: 0.4600 - val_accuracy: 0.7903
Epoch 21/150
4139/4139 [==============================] - 18s 4ms/step - loss: 0.4465 - accuracy: 0.7903 - val_loss: 0.4544 - val_accuracy: 0.7797
Epoch 22/150
4139/4139 [==============================] - 19s 5ms/step - loss: 0.4414 - accuracy: 0.7893 - val_loss: 0.4585 - val_accuracy: 0.7870
Epoch 23/150
4139/4139 [==============================] - 18s 4ms/step - loss: 0.4414 - accuracy: 0.7922 - val_loss: 0.4600 - val_accuracy: 0.7889
Epoch 24/150
4139/4139 [==============================] - 18s 4ms/step - loss: 0.4439 - accuracy: 0.7973 - val_loss: 0.4569 - val_accuracy: 0.7879
Epoch 25/150
4139/4139 [==============================] - 19s 5ms/step - loss: 0.4364 - accuracy: 0.7933 - val_loss: 0.4550 - val_accuracy: 0.7913
Epoch 26/150
4139/4139 [==============================] - 19s 5ms/step - loss: 0.4352 - accuracy: 0.7986 - val_loss: 0.4548 - val_accuracy: 0.7816
Epoch 27/150
4139/4139 [==============================] - 19s 5ms/step - loss: 0.4291 - accuracy: 0.7955 - val_loss: 0.4558 - val_accuracy: 0.7836
Epoch 28/150
4139/4139 [==============================] - 19s 4ms/step - loss: 0.4321 - accuracy: 0.7942 - val_loss: 0.4559 - val_accuracy: 0.7845
Epoch 29/150
4139/4139 [==============================] - 18s 4ms/step - loss: 0.4280 - accuracy: 0.7964 - val_loss: 0.4611 - val_accuracy: 0.7908
Epoch 30/150
4139/4139 [==============================] - 19s 5ms/step - loss: 0.4301 - accuracy: 0.7983 - val_loss: 0.4646 - val_accuracy: 0.7865
Epoch 31/150
4139/4139 [==============================] - 19s 5ms/step - loss: 0.4265 - accuracy: 0.7975 - val_loss: 0.4624 - val_accuracy: 0.7845
Epoch 32/150
4139/4139 [==============================] - 19s 5ms/step - loss: 0.4234 - accuracy: 0.7948 - val_loss: 0.4601 - val_accuracy: 0.7826
Epoch 33/150
4139/4139 [==============================] - 19s 5ms/step - loss: 0.4207 - accuracy: 0.8026 - val_loss: 0.4609 - val_accuracy: 0.7918
Epoch 34/150
4139/4139 [==============================] - 19s 5ms/step - loss: 0.4252 - accuracy: 0.8008 - val_loss: 0.4634 - val_accuracy: 0.7889
Epoch 35/150
4139/4139 [==============================] - 18s 4ms/step - loss: 0.4194 - accuracy: 0.8010 - val_loss: 0.4630 - val_accuracy: 0.7865
Epoch 36/150
4139/4139 [==============================] - 19s 5ms/step - loss: 0.4197 - accuracy: 0.8015 - val_loss: 0.4610 - val_accuracy: 0.7850
Epoch 37/150
4139/4139 [==============================] - 18s 4ms/step - loss: 0.4201 - accuracy: 0.7997 - val_loss: 0.4634 - val_accuracy: 0.7874
Epoch 38/150
4139/4139 [==============================] - 19s 4ms/step - loss: 0.4180 - accuracy: 0.8036 - val_loss: 0.4590 - val_accuracy: 0.7807
Epoch 39/150
4139/4139 [==============================] - 18s 4ms/step - loss: 0.4186 - accuracy: 0.7997 - val_loss: 0.4664 - val_accuracy: 0.7870
Epoch 40/150
4139/4139 [==============================] - 19s 5ms/step - loss: 0.4172 - accuracy: 0.8026 - val_loss: 0.4662 - val_accuracy: 0.7947
Epoch 41/150
4139/4139 [==============================] - 18s 4ms/step - loss: 0.4175 - accuracy: 0.8009 - val_loss: 0.4649 - val_accuracy: 0.7787
Epoch 42/150
4139/4139 [==============================] - 18s 4ms/step - loss: 0.4163 - accuracy: 0.8019 - val_loss: 0.4587 - val_accuracy: 0.7889
Epoch 43/150
4139/4139 [==============================] - 18s 4ms/step - loss: 0.4153 - accuracy: 0.7986 - val_loss: 0.4574 - val_accuracy: 0.7855
Epoch 44/150
4139/4139 [==============================] - 19s 5ms/step - loss: 0.4135 - accuracy: 0.8021 - val_loss: 0.4567 - val_accuracy: 0.7874
Epoch 45/150
4139/4139 [==============================] - 18s 4ms/step - loss: 0.4100 - accuracy: 0.8032 - val_loss: 0.4618 - val_accuracy: 0.7768
Epoch 46/150
4139/4139 [==============================] - 19s 4ms/step - loss: 0.4108 - accuracy: 0.8014 - val_loss: 0.4570 - val_accuracy: 0.7932
Epoch 47/150
4139/4139 [==============================] - 19s 5ms/step - loss: 0.4137 - accuracy: 0.7992 - val_loss: 0.4602 - val_accuracy: 0.7928
Epoch 48/150
4139/4139 [==============================] - 18s 4ms/step - loss: 0.4113 - accuracy: 0.8014 - val_loss: 0.4633 - val_accuracy: 0.7932
Epoch 49/150
4139/4139 [==============================] - 19s 4ms/step - loss: 0.4106 - accuracy: 0.8085 - val_loss: 0.4719 - val_accuracy: 0.7923
Epoch 50/150
4139/4139 [==============================] - 18s 4ms/step - loss: 0.4133 - accuracy: 0.8024 - val_loss: 0.4722 - val_accuracy: 0.7879
Epoch 51/150
4139/4139 [==============================] - 19s 5ms/step - loss: 0.4072 - accuracy: 0.8065 - val_loss: 0.4618 - val_accuracy: 0.7952
Epoch 52/150
4139/4139 [==============================] - 18s 4ms/step - loss: 0.4149 - accuracy: 0.7987 - val_loss: 0.4667 - val_accuracy: 0.7855
Epoch 53/150
4139/4139 [==============================] - 19s 5ms/step - loss: 0.4102 - accuracy: 0.7993 - val_loss: 0.4609 - val_accuracy: 0.7831
Epoch 54/150
4139/4139 [==============================] - 19s 4ms/step - loss: 0.4044 - accuracy: 0.8060 - val_loss: 0.4737 - val_accuracy: 0.7913
Epoch 55/150
4139/4139 [==============================] - 19s 4ms/step - loss: 0.4088 - accuracy: 0.8050 - val_loss: 0.4579 - val_accuracy: 0.7908
Epoch 56/150
4139/4139 [==============================] - 18s 4ms/step - loss: 0.4105 - accuracy: 0.8070 - val_loss: 0.4626 - val_accuracy: 0.7976
Epoch 57/150
4139/4139 [==============================] - 18s 4ms/step - loss: 0.4045 - accuracy: 0.8067 - val_loss: 0.4712 - val_accuracy: 0.7976
Epoch 58/150
4139/4139 [==============================] - 22s 5ms/step - loss: 0.4076 - accuracy: 0.8068 - val_loss: 0.4731 - val_accuracy: 0.7894
Epoch 59/150
4139/4139 [==============================] - 19s 4ms/step - loss: 0.4038 - accuracy: 0.8101 - val_loss: 0.4543 - val_accuracy: 0.7874
Epoch 60/150
4139/4139 [==============================] - 18s 4ms/step - loss: 0.4069 - accuracy: 0.8068 - val_loss: 0.4690 - val_accuracy: 0.7923
Epoch 61/150
4139/4139 [==============================] - 18s 4ms/step - loss: 0.4095 - accuracy: 0.8077 - val_loss: 0.4561 - val_accuracy: 0.7850
Epoch 62/150
4139/4139 [==============================] - 18s 4ms/step - loss: 0.4022 - accuracy: 0.8054 - val_loss: 0.4714 - val_accuracy: 0.7947
Epoch 63/150
4139/4139 [==============================] - 18s 4ms/step - loss: 0.4082 - accuracy: 0.8043 - val_loss: 0.4884 - val_accuracy: 0.7942
Epoch 64/150
4139/4139 [==============================] - 18s 4ms/step - loss: 0.4059 - accuracy: 0.8070 - val_loss: 0.4682 - val_accuracy: 0.7870
Epoch 65/150
4139/4139 [==============================] - 19s 5ms/step - loss: 0.4043 - accuracy: 0.8079 - val_loss: 0.4684 - val_accuracy: 0.7947
Epoch 66/150
4139/4139 [==============================] - 19s 5ms/step - loss: 0.4040 - accuracy: 0.8070 - val_loss: 0.4671 - val_accuracy: 0.7971
Epoch 67/150
4139/4139 [==============================] - 18s 4ms/step - loss: 0.4008 - accuracy: 0.8085 - val_loss: 0.4582 - val_accuracy: 0.7903
Epoch 68/150
4139/4139 [==============================] - 18s 4ms/step - loss: 0.4017 - accuracy: 0.8059 - val_loss: 0.4787 - val_accuracy: 0.8005
Epoch 69/150
4139/4139 [==============================] - 19s 4ms/step - loss: 0.4026 - accuracy: 0.8120 - val_loss: 0.4569 - val_accuracy: 0.7899
Epoch 70/150
4139/4139 [==============================] - 18s 4ms/step - loss: 0.3977 - accuracy: 0.8085 - val_loss: 0.4774 - val_accuracy: 0.7908
Epoch 71/150
4139/4139 [==============================] - 18s 4ms/step - loss: 0.4011 - accuracy: 0.8114 - val_loss: 0.4534 - val_accuracy: 0.7928
Epoch 72/150
4139/4139 [==============================] - 19s 4ms/step - loss: 0.3983 - accuracy: 0.8100 - val_loss: 0.4745 - val_accuracy: 0.8000
Epoch 73/150
4139/4139 [==============================] - 18s 4ms/step - loss: 0.4015 - accuracy: 0.8114 - val_loss: 0.4581 - val_accuracy: 0.7918
Epoch 74/150
4139/4139 [==============================] - 18s 4ms/step - loss: 0.4025 - accuracy: 0.8140 - val_loss: 0.4667 - val_accuracy: 0.7899
Epoch 75/150
4139/4139 [==============================] - 19s 5ms/step - loss: 0.4036 - accuracy: 0.8086 - val_loss: 0.4606 - val_accuracy: 0.7841
Epoch 76/150
4139/4139 [==============================] - 19s 5ms/step - loss: 0.3946 - accuracy: 0.8119 - val_loss: 0.4566 - val_accuracy: 0.7923
Epoch 77/150
4139/4139 [==============================] - 19s 5ms/step - loss: 0.3984 - accuracy: 0.8118 - val_loss: 0.4669 - val_accuracy: 0.8024
Epoch 78/150
4139/4139 [==============================] - 19s 4ms/step - loss: 0.3987 - accuracy: 0.8118 - val_loss: 0.4753 - val_accuracy: 0.7884
Epoch 79/150
4139/4139 [==============================] - 19s 5ms/step - loss: 0.3976 - accuracy: 0.8123 - val_loss: 0.4696 - val_accuracy: 0.7986
Epoch 80/150
4139/4139 [==============================] - 18s 4ms/step - loss: 0.3971 - accuracy: 0.8128 - val_loss: 0.4660 - val_accuracy: 0.7899
Epoch 81/150
4139/4139 [==============================] - 19s 5ms/step - loss: 0.3997 - accuracy: 0.8113 - val_loss: 0.4617 - val_accuracy: 0.7995
Epoch 82/150
4139/4139 [==============================] - 18s 4ms/step - loss: 0.4007 - accuracy: 0.8100 - val_loss: 0.4524 - val_accuracy: 0.7966
Epoch 83/150
4139/4139 [==============================] - 19s 5ms/step - loss: 0.3924 - accuracy: 0.8161 - val_loss: 0.4778 - val_accuracy: 0.7976
Epoch 84/150
4139/4139 [==============================] - 19s 5ms/step - loss: 0.3920 - accuracy: 0.8136 - val_loss: 0.5107 - val_accuracy: 0.7966
Epoch 85/150
4139/4139 [==============================] - 18s 4ms/step - loss: 0.3994 - accuracy: 0.8157 - val_loss: 0.4620 - val_accuracy: 0.7923
Epoch 86/150
4139/4139 [==============================] - 18s 4ms/step - loss: 0.3993 - accuracy: 0.8089 - val_loss: 0.4657 - val_accuracy: 0.7908
Epoch 87/150
4139/4139 [==============================] - 18s 4ms/step - loss: 0.3929 - accuracy: 0.8148 - val_loss: 0.4649 - val_accuracy: 0.8019
Epoch 88/150
4139/4139 [==============================] - 18s 4ms/step - loss: 0.3984 - accuracy: 0.8123 - val_loss: 0.4707 - val_accuracy: 0.7947
Epoch 89/150
4139/4139 [==============================] - 18s 4ms/step - loss: 0.3939 - accuracy: 0.8165 - val_loss: 0.4627 - val_accuracy: 0.7976
Epoch 90/150
4139/4139 [==============================] - 19s 5ms/step - loss: 0.3916 - accuracy: 0.8148 - val_loss: 0.4733 - val_accuracy: 0.7961
Epoch 91/150
4139/4139 [==============================] - 19s 5ms/step - loss: 0.3977 - accuracy: 0.8134 - val_loss: 0.4663 - val_accuracy: 0.7821
Epoch 92/150
4139/4139 [==============================] - 18s 4ms/step - loss: 0.3974 - accuracy: 0.8140 - val_loss: 0.4833 - val_accuracy: 0.7913
Epoch 93/150
4139/4139 [==============================] - 15s 4ms/step - loss: 0.3917 - accuracy: 0.8158 - val_loss: 0.4947 - val_accuracy: 0.7995
Epoch 94/150
4139/4139 [==============================] - 16s 4ms/step - loss: 0.3947 - accuracy: 0.8157 - val_loss: 0.4836 - val_accuracy: 0.7928
Epoch 95/150
4139/4139 [==============================] - 16s 4ms/step - loss: 0.3974 - accuracy: 0.8109 - val_loss: 0.4750 - val_accuracy: 0.7899
Epoch 96/150
4139/4139 [==============================] - 17s 4ms/step - loss: 0.3903 - accuracy: 0.8147 - val_loss: 0.4873 - val_accuracy: 0.8034
Epoch 97/150
4139/4139 [==============================] - 24s 6ms/step - loss: 0.3961 - accuracy: 0.8170 - val_loss: 0.4743 - val_accuracy: 0.7889
Epoch 98/150
4139/4139 [==============================] - 17s 4ms/step - loss: 0.3940 - accuracy: 0.8136 - val_loss: 0.4664 - val_accuracy: 0.7913
Epoch 99/150
4139/4139 [==============================] - 16s 4ms/step - loss: 0.3971 - accuracy: 0.8144 - val_loss: 0.4689 - val_accuracy: 0.7942
Epoch 100/150
4139/4139 [==============================] - 16s 4ms/step - loss: 0.3863 - accuracy: 0.8149 - val_loss: 0.4742 - val_accuracy: 0.8034
Epoch 101/150
4139/4139 [==============================] - 16s 4ms/step - loss: 0.3973 - accuracy: 0.8138 - val_loss: 0.4677 - val_accuracy: 0.7995
Epoch 102/150
4139/4139 [==============================] - 16s 4ms/step - loss: 0.3876 - accuracy: 0.8177 - val_loss: 0.4730 - val_accuracy: 0.7932
Epoch 103/150
4139/4139 [==============================] - 16s 4ms/step - loss: 0.3942 - accuracy: 0.8171 - val_loss: 0.4546 - val_accuracy: 0.7995
Epoch 104/150
4139/4139 [==============================] - 16s 4ms/step - loss: 0.3973 - accuracy: 0.8094 - val_loss: 0.4760 - val_accuracy: 0.7952
Epoch 105/150
4139/4139 [==============================] - 17s 4ms/step - loss: 0.3959 - accuracy: 0.8107 - val_loss: 0.4696 - val_accuracy: 0.8034
Epoch 106/150
4139/4139 [==============================] - 18s 4ms/step - loss: 0.3887 - accuracy: 0.8192 - val_loss: 0.4790 - val_accuracy: 0.8005
Epoch 107/150
4139/4139 [==============================] - 16s 4ms/step - loss: 0.3982 - accuracy: 0.8126 - val_loss: 0.4753 - val_accuracy: 0.8005
Epoch 108/150
4139/4139 [==============================] - 16s 4ms/step - loss: 0.3901 - accuracy: 0.8182 - val_loss: 0.5050 - val_accuracy: 0.8024
Epoch 109/150
4139/4139 [==============================] - 18s 4ms/step - loss: 0.3906 - accuracy: 0.8147 - val_loss: 0.4932 - val_accuracy: 0.7995
Epoch 110/150
4139/4139 [==============================] - 22s 5ms/step - loss: 0.3933 - accuracy: 0.8187 - val_loss: 0.4880 - val_accuracy: 0.7976
Epoch 111/150
4139/4139 [==============================] - 16s 4ms/step - loss: 0.3890 - accuracy: 0.8157 - val_loss: 0.4825 - val_accuracy: 0.8048
Epoch 112/150
4139/4139 [==============================] - 16s 4ms/step - loss: 0.3934 - accuracy: 0.8166 - val_loss: 0.4522 - val_accuracy: 0.7976
Epoch 113/150
4139/4139 [==============================] - 16s 4ms/step - loss: 0.3927 - accuracy: 0.8173 - val_loss: 0.4651 - val_accuracy: 0.7903
Epoch 114/150
4139/4139 [==============================] - 16s 4ms/step - loss: 0.3940 - accuracy: 0.8137 - val_loss: 0.4699 - val_accuracy: 0.7976
Epoch 115/150
4139/4139 [==============================] - 16s 4ms/step - loss: 0.3881 - accuracy: 0.8178 - val_loss: 0.4644 - val_accuracy: 0.7894
Epoch 116/150
4139/4139 [==============================] - 19s 5ms/step - loss: 0.3904 - accuracy: 0.8164 - val_loss: 0.5028 - val_accuracy: 0.7899
Epoch 117/150
4139/4139 [==============================] - 17s 4ms/step - loss: 0.3874 - accuracy: 0.8219 - val_loss: 0.4763 - val_accuracy: 0.8000
Epoch 118/150
4139/4139 [==============================] - 18s 4ms/step - loss: 0.3937 - accuracy: 0.8177 - val_loss: 0.4695 - val_accuracy: 0.7971
Epoch 119/150
4139/4139 [==============================] - 19s 5ms/step - loss: 0.3805 - accuracy: 0.8230 - val_loss: 0.4868 - val_accuracy: 0.8053
Epoch 120/150
4139/4139 [==============================] - 19s 4ms/step - loss: 0.3941 - accuracy: 0.8167 - val_loss: 0.4712 - val_accuracy: 0.7971
Epoch 121/150
4139/4139 [==============================] - 18s 4ms/step - loss: 0.3838 - accuracy: 0.8221 - val_loss: 0.4761 - val_accuracy: 0.8005
Epoch 122/150
4139/4139 [==============================] - 19s 5ms/step - loss: 0.3895 - accuracy: 0.8204 - val_loss: 0.4834 - val_accuracy: 0.8029
Epoch 123/150
4139/4139 [==============================] - 17s 4ms/step - loss: 0.3903 - accuracy: 0.8154 - val_loss: 0.4779 - val_accuracy: 0.8005
Epoch 124/150
4139/4139 [==============================] - 17s 4ms/step - loss: 0.3876 - accuracy: 0.8211 - val_loss: 0.4721 - val_accuracy: 0.8019
Epoch 125/150
4139/4139 [==============================] - 16s 4ms/step - loss: 0.3849 - accuracy: 0.8209 - val_loss: 0.4827 - val_accuracy: 0.7976
Epoch 126/150
4139/4139 [==============================] - 17s 4ms/step - loss: 0.3942 - accuracy: 0.8172 - val_loss: 0.4762 - val_accuracy: 0.7889
Epoch 127/150
4139/4139 [==============================] - 18s 4ms/step - loss: 0.3873 - accuracy: 0.8193 - val_loss: 0.5000 - val_accuracy: 0.7971
Epoch 128/150
4139/4139 [==============================] - 18s 4ms/step - loss: 0.3882 - accuracy: 0.8184 - val_loss: 0.4710 - val_accuracy: 0.8014
Epoch 129/150
4139/4139 [==============================] - 19s 5ms/step - loss: 0.3845 - accuracy: 0.8196 - val_loss: 0.4926 - val_accuracy: 0.8019
Epoch 130/150
4139/4139 [==============================] - 18s 4ms/step - loss: 0.3850 - accuracy: 0.8202 - val_loss: 0.5442 - val_accuracy: 0.7976
Epoch 131/150
4139/4139 [==============================] - 17s 4ms/step - loss: 0.3844 - accuracy: 0.8187 - val_loss: 0.5249 - val_accuracy: 0.7937
Epoch 132/150
4139/4139 [==============================] - 17s 4ms/step - loss: 0.3854 - accuracy: 0.8187 - val_loss: 0.5153 - val_accuracy: 0.7961
Epoch 133/150
4139/4139 [==============================] - 16s 4ms/step - loss: 0.3885 - accuracy: 0.8170 - val_loss: 0.4789 - val_accuracy: 0.7981
Epoch 134/150
4139/4139 [==============================] - 17s 4ms/step - loss: 0.3820 - accuracy: 0.8193 - val_loss: 0.4811 - val_accuracy: 0.7932
Epoch 135/150
4139/4139 [==============================] - 16s 4ms/step - loss: 0.3888 - accuracy: 0.8161 - val_loss: 0.4679 - val_accuracy: 0.7860
Epoch 136/150
4139/4139 [==============================] - 18s 4ms/step - loss: 0.3838 - accuracy: 0.8161 - val_loss: 0.5319 - val_accuracy: 0.7899
Epoch 137/150
4139/4139 [==============================] - 17s 4ms/step - loss: 0.3883 - accuracy: 0.8166 - val_loss: 0.5034 - val_accuracy: 0.7990
Epoch 138/150
4139/4139 [==============================] - 16s 4ms/step - loss: 0.3859 - accuracy: 0.8178 - val_loss: 0.4971 - val_accuracy: 0.8000
Epoch 139/150
4134/4139 [============================>.] - ETA: 0s - loss: 0.3864 - accuracy: 0.8175Restoring model weights from the end of the best epoch: 119.
4139/4139 [==============================] - 17s 4ms/step - loss: 0.3866 - accuracy: 0.8173 - val_loss: 0.4838 - val_accuracy: 0.7976
Epoch 139: early stopping
In [259]:
ANN_model_3.evaluate(x_test_3, y_test_3)
65/65 [==============================] - 0s 3ms/step - loss: 0.4868 - accuracy: 0.8053
Out[259]:
[0.4867652654647827, 0.8053140044212341]
In [260]:
fig_5 = go.Figure()

fig_5.add_trace(go.Scatter(x =np.arange(0,len(model_history_3.history['accuracy'])),
                         y = model_history_3.history['val_accuracy'],
                         mode='lines+markers',
                         name='val_accuracy'))
fig_5.add_trace(go.Scatter(x =np.arange(0,len(model_history_3.history['accuracy'])),
                         y = model_history_3.history['accuracy'],
                         mode='lines+markers',
                         name='Accuracy'))
fig_5.update_layout(title = 'ACCURACY vs VALIDATION_ACCURACY')

fig_5.update_xaxes(title_text="Epochs")
fig_5.update_yaxes(title_text="Accuracy")

fig_5.show()
In [261]:
fig_6 = go.Figure()

fig_6.add_trace(go.Scatter(x =np.arange(0,len(model_history_3.history['loss'])),
                         y = model_history_3.history['loss'],
                         mode='lines+markers',
                         name='loss'))
fig_6.add_trace(go.Scatter(x =np.arange(0,len(model_history_3.history['loss'])),
                         y = model_history_3.history['val_loss'],
                         mode='lines+markers',
                         name='val_loss'))
fig_6.update_layout(title = 'LOSS vs VALIDATION_LOSS')

fig_6.update_xaxes(title_text="Epochs")
fig_6.update_yaxes(title_text="Loss")

fig_6.show()
In [263]:
predict_3 = ANN_model_3.predict(x_test_3)

predict_new_3 = []
for x in predict_3:
    if x >= 0.5:
        predict_new_3.append(1)
    else:
        predict_new_3.append(0)   
65/65 [==============================] - 0s 2ms/step
In [264]:
plx.imshow(confusion_matrix( y_test_3, predict_new_3), text_auto = True)
In [265]:
print(classification_report(y_test_3, predict_new_3))
              precision    recall  f1-score   support

         0.0       0.89      0.70      0.78      1033
         1.0       0.75      0.91      0.82      1037

    accuracy                           0.81      2070
   macro avg       0.82      0.81      0.80      2070
weighted avg       0.82      0.81      0.80      2070

The accuracy remains same(81%) but precision and recall is improved here after over sampling technique...¶

Model_4¶

Let's check accuracy after removing least importance features...¶

Over sampling after removing some least importance features...¶

In [266]:
x_train_3_new = x_train_3.drop(to_remove_features, axis = 1)
x_test_3_new = x_test_3.drop(to_remove_features, axis = 1)
In [268]:
ANN_model_4 = Sequential()

# Adding Input Layer to ANN
ANN_model_4.add(Dense(units = 9, activation = 'relu'))

# Adding 1st Hidden Layer to the ANN
ANN_model_4.add(Dense(units = 7, activation = 'relu'))
ANN_model_4.add(Dropout(0.3))

# Adding 2nd Hidden Layer to the ANN
ANN_model_4.add(Dense(units = 3, activation = 'relu'))
ANN_model_4.add(Dropout(0.3))

# Adding Output Layer to the ANN
ANN_model_4.add(Dense(units = 1, activation = 'sigmoid'))

ANN_model_4.compile(optimizer = 'adam',
                  loss = 'binary_crossentropy',
                  metrics = ['accuracy'])

early_stopping = tf.keras.callbacks.EarlyStopping(
    monitor="accuracy",
    min_delta=0.0001,
    patience=20,
    verbose=1,
    mode="auto",
    baseline=None,
    restore_best_weights=True
)

model_history_4 = ANN_model_4.fit(x_train_3_new, y_train_3, batch_size = 2, epochs = 150, validation_data = (x_test_3_new, y_test_3), callbacks = early_stopping )
Epoch 1/150
4139/4139 [==============================] - 16s 4ms/step - loss: 0.6040 - accuracy: 0.6622 - val_loss: 0.5274 - val_accuracy: 0.7594
Epoch 2/150
4139/4139 [==============================] - 17s 4ms/step - loss: 0.5670 - accuracy: 0.7050 - val_loss: 0.5117 - val_accuracy: 0.7609
Epoch 3/150
4139/4139 [==============================] - 16s 4ms/step - loss: 0.5690 - accuracy: 0.7021 - val_loss: 0.5062 - val_accuracy: 0.7647
Epoch 4/150
4139/4139 [==============================] - 17s 4ms/step - loss: 0.5616 - accuracy: 0.7114 - val_loss: 0.4981 - val_accuracy: 0.7623
Epoch 5/150
4139/4139 [==============================] - 17s 4ms/step - loss: 0.5528 - accuracy: 0.7104 - val_loss: 0.4930 - val_accuracy: 0.7609
Epoch 6/150
4139/4139 [==============================] - 17s 4ms/step - loss: 0.5539 - accuracy: 0.7125 - val_loss: 0.5025 - val_accuracy: 0.7604
Epoch 7/150
4139/4139 [==============================] - 17s 4ms/step - loss: 0.5501 - accuracy: 0.7165 - val_loss: 0.4955 - val_accuracy: 0.7657
Epoch 8/150
4139/4139 [==============================] - 16s 4ms/step - loss: 0.5504 - accuracy: 0.7108 - val_loss: 0.4920 - val_accuracy: 0.7686
Epoch 9/150
4139/4139 [==============================] - 15s 4ms/step - loss: 0.5517 - accuracy: 0.7119 - val_loss: 0.4947 - val_accuracy: 0.7691
Epoch 10/150
4139/4139 [==============================] - 16s 4ms/step - loss: 0.5523 - accuracy: 0.7113 - val_loss: 0.4922 - val_accuracy: 0.7696
Epoch 11/150
4139/4139 [==============================] - 16s 4ms/step - loss: 0.5517 - accuracy: 0.7171 - val_loss: 0.4958 - val_accuracy: 0.7681
Epoch 12/150
4139/4139 [==============================] - 16s 4ms/step - loss: 0.5506 - accuracy: 0.7167 - val_loss: 0.4878 - val_accuracy: 0.7676
Epoch 13/150
4139/4139 [==============================] - 16s 4ms/step - loss: 0.5538 - accuracy: 0.7144 - val_loss: 0.5013 - val_accuracy: 0.7729
Epoch 14/150
4139/4139 [==============================] - 15s 4ms/step - loss: 0.5467 - accuracy: 0.7178 - val_loss: 0.4856 - val_accuracy: 0.7599
Epoch 15/150
4139/4139 [==============================] - 15s 4ms/step - loss: 0.5473 - accuracy: 0.7195 - val_loss: 0.4957 - val_accuracy: 0.7662
Epoch 16/150
4139/4139 [==============================] - 15s 4ms/step - loss: 0.5459 - accuracy: 0.7148 - val_loss: 0.5051 - val_accuracy: 0.7758
Epoch 17/150
4139/4139 [==============================] - 15s 4ms/step - loss: 0.5505 - accuracy: 0.7138 - val_loss: 0.4956 - val_accuracy: 0.7705
Epoch 18/150
4139/4139 [==============================] - 15s 4ms/step - loss: 0.5475 - accuracy: 0.7138 - val_loss: 0.4894 - val_accuracy: 0.7652
Epoch 19/150
4139/4139 [==============================] - 16s 4ms/step - loss: 0.5450 - accuracy: 0.7187 - val_loss: 0.4949 - val_accuracy: 0.7647
Epoch 20/150
4139/4139 [==============================] - 15s 4ms/step - loss: 0.5496 - accuracy: 0.7142 - val_loss: 0.4929 - val_accuracy: 0.7643
Epoch 21/150
4139/4139 [==============================] - 16s 4ms/step - loss: 0.5413 - accuracy: 0.7180 - val_loss: 0.4989 - val_accuracy: 0.7734
Epoch 22/150
4139/4139 [==============================] - 16s 4ms/step - loss: 0.5471 - accuracy: 0.7150 - val_loss: 0.4961 - val_accuracy: 0.7725
Epoch 23/150
4139/4139 [==============================] - 16s 4ms/step - loss: 0.5427 - accuracy: 0.7228 - val_loss: 0.4950 - val_accuracy: 0.7686
Epoch 24/150
4139/4139 [==============================] - 16s 4ms/step - loss: 0.5408 - accuracy: 0.7222 - val_loss: 0.4952 - val_accuracy: 0.7657
Epoch 25/150
4139/4139 [==============================] - 16s 4ms/step - loss: 0.5468 - accuracy: 0.7196 - val_loss: 0.4937 - val_accuracy: 0.7725
Epoch 26/150
4139/4139 [==============================] - 16s 4ms/step - loss: 0.5452 - accuracy: 0.7174 - val_loss: 0.4961 - val_accuracy: 0.7681
Epoch 27/150
4139/4139 [==============================] - 16s 4ms/step - loss: 0.5446 - accuracy: 0.7162 - val_loss: 0.4947 - val_accuracy: 0.7749
Epoch 28/150
4139/4139 [==============================] - 16s 4ms/step - loss: 0.5426 - accuracy: 0.7166 - val_loss: 0.4976 - val_accuracy: 0.7725
Epoch 29/150
4139/4139 [==============================] - 16s 4ms/step - loss: 0.5434 - accuracy: 0.7144 - val_loss: 0.5031 - val_accuracy: 0.7686
Epoch 30/150
4139/4139 [==============================] - 16s 4ms/step - loss: 0.5423 - accuracy: 0.7167 - val_loss: 0.5033 - val_accuracy: 0.7778
Epoch 31/150
4139/4139 [==============================] - 15s 4ms/step - loss: 0.5451 - accuracy: 0.7162 - val_loss: 0.4872 - val_accuracy: 0.7729
Epoch 32/150
4139/4139 [==============================] - 16s 4ms/step - loss: 0.5472 - accuracy: 0.7125 - val_loss: 0.4969 - val_accuracy: 0.7691
Epoch 33/150
4139/4139 [==============================] - 15s 4ms/step - loss: 0.5419 - accuracy: 0.7193 - val_loss: 0.4846 - val_accuracy: 0.7734
Epoch 34/150
4139/4139 [==============================] - 15s 4ms/step - loss: 0.5455 - accuracy: 0.7150 - val_loss: 0.4932 - val_accuracy: 0.7681
Epoch 35/150
4139/4139 [==============================] - 15s 4ms/step - loss: 0.5376 - accuracy: 0.7195 - val_loss: 0.4957 - val_accuracy: 0.7710
Epoch 36/150
4139/4139 [==============================] - 16s 4ms/step - loss: 0.5479 - accuracy: 0.7139 - val_loss: 0.5128 - val_accuracy: 0.7662
Epoch 37/150
4139/4139 [==============================] - 16s 4ms/step - loss: 0.5440 - accuracy: 0.7232 - val_loss: 0.4970 - val_accuracy: 0.7691
Epoch 38/150
4139/4139 [==============================] - 16s 4ms/step - loss: 0.5457 - accuracy: 0.7172 - val_loss: 0.4958 - val_accuracy: 0.7700
Epoch 39/150
4139/4139 [==============================] - 15s 4ms/step - loss: 0.5422 - accuracy: 0.7188 - val_loss: 0.5018 - val_accuracy: 0.7749
Epoch 40/150
4139/4139 [==============================] - 16s 4ms/step - loss: 0.5431 - accuracy: 0.7153 - val_loss: 0.5043 - val_accuracy: 0.7705
Epoch 41/150
4139/4139 [==============================] - 15s 4ms/step - loss: 0.5431 - accuracy: 0.7122 - val_loss: 0.4928 - val_accuracy: 0.7754
Epoch 42/150
4139/4139 [==============================] - 16s 4ms/step - loss: 0.5387 - accuracy: 0.7270 - val_loss: 0.5048 - val_accuracy: 0.7681
Epoch 43/150
4139/4139 [==============================] - 15s 4ms/step - loss: 0.5376 - accuracy: 0.7184 - val_loss: 0.4964 - val_accuracy: 0.7671
Epoch 44/150
4139/4139 [==============================] - 16s 4ms/step - loss: 0.5426 - accuracy: 0.7185 - val_loss: 0.4961 - val_accuracy: 0.7710
Epoch 45/150
4139/4139 [==============================] - 15s 4ms/step - loss: 0.5446 - accuracy: 0.7193 - val_loss: 0.5010 - val_accuracy: 0.7686
Epoch 46/150
4139/4139 [==============================] - 15s 4ms/step - loss: 0.5462 - accuracy: 0.7183 - val_loss: 0.4978 - val_accuracy: 0.7778
Epoch 47/150
4139/4139 [==============================] - 15s 4ms/step - loss: 0.5435 - accuracy: 0.7171 - val_loss: 0.4882 - val_accuracy: 0.7754
Epoch 48/150
4139/4139 [==============================] - 15s 4ms/step - loss: 0.5409 - accuracy: 0.7235 - val_loss: 0.5009 - val_accuracy: 0.7773
Epoch 49/150
4139/4139 [==============================] - 16s 4ms/step - loss: 0.5427 - accuracy: 0.7235 - val_loss: 0.4938 - val_accuracy: 0.7720
Epoch 50/150
4139/4139 [==============================] - 16s 4ms/step - loss: 0.5446 - accuracy: 0.7168 - val_loss: 0.4988 - val_accuracy: 0.7773
Epoch 51/150
4139/4139 [==============================] - 15s 4ms/step - loss: 0.5434 - accuracy: 0.7201 - val_loss: 0.4999 - val_accuracy: 0.7691
Epoch 52/150
4139/4139 [==============================] - 16s 4ms/step - loss: 0.5478 - accuracy: 0.7222 - val_loss: 0.5020 - val_accuracy: 0.7729
Epoch 53/150
4139/4139 [==============================] - 16s 4ms/step - loss: 0.5429 - accuracy: 0.7184 - val_loss: 0.4853 - val_accuracy: 0.7758
Epoch 54/150
4139/4139 [==============================] - 15s 4ms/step - loss: 0.5409 - accuracy: 0.7183 - val_loss: 0.4887 - val_accuracy: 0.7773
Epoch 55/150
4139/4139 [==============================] - 16s 4ms/step - loss: 0.5436 - accuracy: 0.7214 - val_loss: 0.4972 - val_accuracy: 0.7725
Epoch 56/150
4139/4139 [==============================] - 15s 4ms/step - loss: 0.5425 - accuracy: 0.7161 - val_loss: 0.4982 - val_accuracy: 0.7749
Epoch 57/150
4139/4139 [==============================] - 15s 4ms/step - loss: 0.5413 - accuracy: 0.7225 - val_loss: 0.4922 - val_accuracy: 0.7734
Epoch 58/150
4139/4139 [==============================] - 15s 4ms/step - loss: 0.5429 - accuracy: 0.7203 - val_loss: 0.4894 - val_accuracy: 0.7758
Epoch 59/150
4139/4139 [==============================] - 15s 4ms/step - loss: 0.5355 - accuracy: 0.7257 - val_loss: 0.4955 - val_accuracy: 0.7696
Epoch 60/150
4139/4139 [==============================] - 15s 4ms/step - loss: 0.5441 - accuracy: 0.7199 - val_loss: 0.5041 - val_accuracy: 0.7787
Epoch 61/150
4139/4139 [==============================] - 15s 4ms/step - loss: 0.5337 - accuracy: 0.7241 - val_loss: 0.4973 - val_accuracy: 0.7700
Epoch 62/150
4122/4139 [============================>.] - ETA: 0s - loss: 0.5386 - accuracy: 0.7261Restoring model weights from the end of the best epoch: 42.
4139/4139 [==============================] - 15s 4ms/step - loss: 0.5387 - accuracy: 0.7257 - val_loss: 0.4959 - val_accuracy: 0.7628
Epoch 62: early stopping
In [269]:
fig_7 = go.Figure()

fig_7.add_trace(go.Scatter(x =np.arange(0,len(model_history_4.history['accuracy'])),
                         y = model_history_4.history['val_accuracy'],
                         mode='lines+markers',
                         name='val_accuracy'))
fig_7.add_trace(go.Scatter(x =np.arange(0,len(model_history_4.history['accuracy'])),
                         y = model_history_4.history['accuracy'],
                         mode='lines+markers',
                         name='Accuracy'))
fig_7.update_layout(title = 'ACCURACY vs VALIDATION_ACCURACY')

fig_7.update_xaxes(title_text="Epochs")
fig_7.update_yaxes(title_text="Accuracy")

fig_7.show()
In [270]:
fig_8 = go.Figure()

fig_8.add_trace(go.Scatter(x =np.arange(0,len(model_history_4.history['loss'])),
                         y = model_history_4.history['loss'],
                         mode='lines+markers',
                         name='loss'))
fig_8.add_trace(go.Scatter(x =np.arange(0,len(model_history_4.history['loss'])),
                         y = model_history_4.history['val_loss'],
                         mode='lines+markers',
                         name='val_loss'))
fig_8.update_layout(title = 'LOSS vs VALIDATION_LOSS')

fig_8.update_xaxes(title_text="Epochs")
fig_8.update_yaxes(title_text="Loss")

fig_8.show()
In [275]:
predict_4 = ANN_model_4.predict(x_test_3_new)

predict_new_4 = []
for x in predict_4:
    if x >= 0.5:
        predict_new_4.append(1)
    else:
        predict_new_4.append(0)   
65/65 [==============================] - 0s 1ms/step
In [276]:
plx.imshow(confusion_matrix(y_test_3, predict_new_4), text_auto = True)
In [278]:
print(classification_report(y_test_3, predict_new_4))
              precision    recall  f1-score   support

         0.0       0.76      0.79      0.77      1033
         1.0       0.78      0.74      0.76      1037

    accuracy                           0.77      2070
   macro avg       0.77      0.77      0.77      2070
weighted avg       0.77      0.77      0.77      2070

After implementing over sampling and dimensionality reduction we got an accuracy of 77%.¶

4% less than the previous model where only over sampling was done..¶

We can pick third model as final, where over sampling done without domensionality reduction.¶

In [279]:
import pickle as pkl
In [280]:
with open("Telco_customer_churn_prediction_model.pkl", "wb") as f:
    pkl.dump(ANN_model_3, f)
INFO:tensorflow:Assets written to: ram://783be2b1-7c12-4fdb-b82e-2e8b551590e0/assets

Our model is stored in pickle file successfully.¶